Re: [Cython] optimizing list.pop
On Nov 3, 2009, at 2:37 PM, Lisandro Dalcin wrote: > On Tue, Nov 3, 2009 at 4:20 PM, Robert Bradshaw > wrote: >> I concede that we're partially copying the (half-line) list-shrink >> trigger >> algorithm here, but if that changed nothing would break. It's not >> like we're >> using any secret APIs. >> > > So L->allocated is public API? If you can get rid of that, I think no > one would object your patch... Well, it's clearly documented in listobject.h, and we didn't have to import any special headers to get it (like we do for frame objects and exception handling). There's nothing clean and simple to use instead, at least not without giving up nearly all the potential savings. >> For the no argument case, there really is hardly any code >> duplication. Also, >> we don't do any reallocation or anything, we just note that if the >> number of >> items get small enough we dispatch to the generic case. I think the >> resulting 5x speedup is worth it. >> >>> Why not use PyList_GET_ITEM(L) + Py_INCREF() + PyList_SetSlice() ?? >>> This way, you could easily support "L.pop(index)"... >> >> Have you seen the code for PyList_SetSlice? It's way more generic >> (and >> complicated) than is needed in this case. And if one were to use >> the methods >> mentioned above, one would still need all the bounds checking code >> that's >> there (the only savings would be calling PyList_SetSlice rather >> than having >> a manual loop, at the expense of extra memory management). Again, >> I'd argue >> it's worth an 10x speedup. >> > > Robert, "premature optimization is the root of all evil"... You are > trying to be faster than CPython!! list.pop() is implemented more or > less with PyList_SetSlice !! You say this as if it's a bad thing to be faster than CPython :) >> Ideally there should be a PyList_Pop method in the C API, and more >> efficient >> than the one that's there, but that's not the case. > > Yes, of course... And if it were, it would likely be implemented with > PyList_SetSlice, as the list.pop() method is currently is... Or core > CPython devs are truly lazy, or you Robert are wasting valuable time > in micro-optimizations ... Well, at least I spent less time on that micro-optimization than answering these emails :). (Not that I think that either is a waste.) My hunch is that since there was no (natural) way to pop without a Python function call, and the attribute lookup and argument unpacking took more time each than the PyList_SetSlice code overhead, there was no incentive for the core CPython devs to put any time into optimizing this, if it even crossed their radar. - Robert ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
Re: [Cython] optimizing list.pop
On Tue, Nov 3, 2009 at 4:20 PM, Robert Bradshaw wrote: > On Nov 3, 2009, at 7:08 AM, Lisandro Dalcin wrote: > >> On Tue, Nov 3, 2009 at 12:28 PM, Dag Sverre Seljebotn >> wrote: >>> >>> Stefan Behnel wrote: Robert Bradshaw, 03.11.2009 10:22: > On Oct 25, 2009, at 11:53 AM, Lisandro Dalcin wrote: > >> I think Cython could generate a custom, inline function >> implementing l.pop([n]) and fast dispatch to it if the object is >> exactly a list ... >> > http://hg.cython.org/cython-devel/rev/2e3dda4a7d23 > Hmm, is this really worth the code duplication with CPython? I mean, we are partly copying the list-shrink trigger algorithm here, which I would prefer to consider an implementation detail of the list type. >>> I second this, this is the kind of code I don't like to see Cython >>> output. At the very least, one should #IFDEF it to only kick in with >>> known good Python versions...it's not the kind of thing we'll >>> automatically remember to check when Python 3.4 comes out. > > I concede that we're partially copying the (half-line) list-shrink trigger > algorithm here, but if that changed nothing would break. It's not like we're > using any secret APIs. > So L->allocated is public API? If you can get rid of that, I think no one would object your patch... >>> It should be possible to find a middle route somewhere though? (At the >>> very least prefetch "pop" of the list type rather than use getattr every >>> time.) >>> >>> Dag Sverre >>> ___ >>> Cython-dev mailing list >>> Cython-dev@codespeak.net >>> http://codespeak.net/mailman/listinfo/cython-dev >>> >> >> I agree with Stefan and Dag... > > For the no argument case, there really is hardly any code duplication. Also, > we don't do any reallocation or anything, we just note that if the number of > items get small enough we dispatch to the generic case. I think the > resulting 5x speedup is worth it. > >> Why not use PyList_GET_ITEM(L) + Py_INCREF() + PyList_SetSlice() ?? >> This way, you could easily support "L.pop(index)"... > > Have you seen the code for PyList_SetSlice? It's way more generic (and > complicated) than is needed in this case. And if one were to use the methods > mentioned above, one would still need all the bounds checking code that's > there (the only savings would be calling PyList_SetSlice rather than having > a manual loop, at the expense of extra memory management). Again, I'd argue > it's worth an 10x speedup. > Robert, "premature optimization is the root of all evil"... You are trying to be faster than CPython!! list.pop() is implemented more or less with PyList_SetSlice !! > Ideally there should be a PyList_Pop method in the C API, and more efficient > than the one that's there, but that's not the case. > Yes, of course... And if it were, it would likely be implemented with PyList_SetSlice, as the list.pop() method is currently is... Or core CPython devs are truly lazy, or you Robert are wasting valuable time in micro-optimizations ... > --- > > In [1]: from benchmark_pop import * > > In [2]: L = range(10**7) > > In [3]: time cython_pop(L, 10**7) > CPU times: user 0.15 s, sys: 0.00 s, total: 0.16 s > Wall time: 0.16 s > > In [5]: L = range(10**7) > > In [6]: time python_pop(L, 10**7) > CPU times: user 0.78 s, sys: 0.00 s, total: 0.78 s > Wall time: 0.78 s > > In [8]: L = range(10**7) > > In [9]: time cython_pop_index(L, 10**7-1, -2) > CPU times: user 0.21 s, sys: 0.00 s, total: 0.21 s > Wall time: 0.21 s > > In [11]: L = range(10**7) > > In [12]: time python_pop_index(L, 10**7-1, -2) > CPU times: user 2.38 s, sys: 0.01 s, total: 2.39 s > Wall time: 2.40 s > > > ___ > Cython-dev mailing list > Cython-dev@codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > > -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
Re: [Cython] optimizing list.pop
On Nov 3, 2009, at 7:08 AM, Lisandro Dalcin wrote: On Tue, Nov 3, 2009 at 12:28 PM, Dag Sverre Seljebotn wrote: Stefan Behnel wrote: Robert Bradshaw, 03.11.2009 10:22: On Oct 25, 2009, at 11:53 AM, Lisandro Dalcin wrote: I think Cython could generate a custom, inline function implementing l.pop([n]) and fast dispatch to it if the object is exactly a list ... http://hg.cython.org/cython-devel/rev/2e3dda4a7d23 Hmm, is this really worth the code duplication with CPython? I mean, we are partly copying the list-shrink trigger algorithm here, which I would prefer to consider an implementation detail of the list type. I second this, this is the kind of code I don't like to see Cython output. At the very least, one should #IFDEF it to only kick in with known good Python versions...it's not the kind of thing we'll automatically remember to check when Python 3.4 comes out. I concede that we're partially copying the (half-line) list-shrink trigger algorithm here, but if that changed nothing would break. It's not like we're using any secret APIs. It should be possible to find a middle route somewhere though? (At the very least prefetch "pop" of the list type rather than use getattr every time.) Dag Sverre ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev I agree with Stefan and Dag... For the no argument case, there really is hardly any code duplication. Also, we don't do any reallocation or anything, we just note that if the number of items get small enough we dispatch to the generic case. I think the resulting 5x speedup is worth it. Why not use PyList_GET_ITEM(L) + Py_INCREF() + PyList_SetSlice() ?? This way, you could easily support "L.pop(index)"... Have you seen the code for PyList_SetSlice? It's way more generic (and complicated) than is needed in this case. And if one were to use the methods mentioned above, one would still need all the bounds checking code that's there (the only savings would be calling PyList_SetSlice rather than having a manual loop, at the expense of extra memory management). Again, I'd argue it's worth an 10x speedup. Ideally there should be a PyList_Pop method in the C API, and more efficient than the one that's there, but that's not the case. - Robert --- In [1]: from benchmark_pop import * In [2]: L = range(10**7) In [3]: time cython_pop(L, 10**7) CPU times: user 0.15 s, sys: 0.00 s, total: 0.16 s Wall time: 0.16 s In [5]: L = range(10**7) In [6]: time python_pop(L, 10**7) CPU times: user 0.78 s, sys: 0.00 s, total: 0.78 s Wall time: 0.78 s In [8]: L = range(10**7) In [9]: time cython_pop_index(L, 10**7-1, -2) CPU times: user 0.21 s, sys: 0.00 s, total: 0.21 s Wall time: 0.21 s In [11]: L = range(10**7) In [12]: time python_pop_index(L, 10**7-1, -2) CPU times: user 2.38 s, sys: 0.01 s, total: 2.39 s Wall time: 2.40 s benchmark_pop.pyx Description: Binary data ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
Re: [Cython] optimizing list.pop
On Tue, Nov 3, 2009 at 12:28 PM, Dag Sverre Seljebotn wrote: > Stefan Behnel wrote: >> Robert Bradshaw, 03.11.2009 10:22: >> >>> On Oct 25, 2009, at 11:53 AM, Lisandro Dalcin wrote: >>> I think Cython could generate a custom, inline function implementing l.pop([n]) and fast dispatch to it if the object is exactly a list ... >>> http://hg.cython.org/cython-devel/rev/2e3dda4a7d23 >>> >> >> Hmm, is this really worth the code duplication with CPython? I mean, we are >> partly copying the list-shrink trigger algorithm here, which I would prefer >> to consider an implementation detail of the list type. >> > I second this, this is the kind of code I don't like to see Cython > output. At the very least, one should #IFDEF it to only kick in with > known good Python versions...it's not the kind of thing we'll > automatically remember to check when Python 3.4 comes out. > > It should be possible to find a middle route somewhere though? (At the > very least prefetch "pop" of the list type rather than use getattr every > time.) > > Dag Sverre > ___ > Cython-dev mailing list > Cython-dev@codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > I agree with Stefan and Dag... Why not use PyList_GET_ITEM(L) + Py_INCREF() + PyList_SetSlice() ?? This way, you could easily support "L.pop(index)"... -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
Re: [Cython] optimizing list.pop
Stefan Behnel wrote: > Robert Bradshaw, 03.11.2009 10:22: > >> On Oct 25, 2009, at 11:53 AM, Lisandro Dalcin wrote: >> >>> I think Cython could generate a custom, inline function >>> implementing l.pop([n]) and fast dispatch to it if the object is >>> exactly a list ... >>> >> http://hg.cython.org/cython-devel/rev/2e3dda4a7d23 >> > > Hmm, is this really worth the code duplication with CPython? I mean, we are > partly copying the list-shrink trigger algorithm here, which I would prefer > to consider an implementation detail of the list type. > I second this, this is the kind of code I don't like to see Cython output. At the very least, one should #IFDEF it to only kick in with known good Python versions...it's not the kind of thing we'll automatically remember to check when Python 3.4 comes out. It should be possible to find a middle route somewhere though? (At the very least prefetch "pop" of the list type rather than use getattr every time.) Dag Sverre ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
Re: [Cython] optimizing list.pop
Robert Bradshaw, 03.11.2009 10:22: > On Oct 25, 2009, at 11:53 AM, Lisandro Dalcin wrote: >> I think Cython could generate a custom, inline function >> implementing l.pop([n]) and fast dispatch to it if the object is >> exactly a list ... > > http://hg.cython.org/cython-devel/rev/2e3dda4a7d23 Hmm, is this really worth the code duplication with CPython? I mean, we are partly copying the list-shrink trigger algorithm here, which I would prefer to consider an implementation detail of the list type. Stefan ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
Re: [Cython] optimizing list.pop
On Oct 25, 2009, at 11:53 AM, Lisandro Dalcin wrote: > On Sun, Oct 25, 2009 at 2:39 PM, Sturla Molden > wrote: >> I just realized the Python does not provide a fast way of popping an >> item from a list, just append. I.e. there is a PyList_Append, but no >> PyList_Pop. And in Python's code for _listobject.c, the function >> listpop >> is declared static. When using a list as a stack (a common case for >> Python lists), a fast pop is just as important as a fast append. >> > > Indeed. +1 for the idea, but... > >> Since the function listpop is static, the fastest way to pop a list >> is >> to grab the method named "pop" from the PyTypeObject. This might >> not be >> obvious to everyone, so I think it is an optimization that Cython >> should do. >> > > -0.5 for the implementation... The hack is cute, but still have to > make a listpop(self,args), were you need to create a tuple to make the > call... I think Cython could generate a custom, inline function > implementing l.pop([n]) and fast dispatch to it if the object is > exactly a list ... http://hg.cython.org/cython-devel/rev/2e3dda4a7d23 - Robert ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
Re: [Cython] optimizing list.pop
On Sun, Oct 25, 2009 at 2:39 PM, Sturla Molden wrote: > I just realized the Python does not provide a fast way of popping an > item from a list, just append. I.e. there is a PyList_Append, but no > PyList_Pop. And in Python's code for _listobject.c, the function listpop > is declared static. When using a list as a stack (a common case for > Python lists), a fast pop is just as important as a fast append. > Indeed. +1 for the idea, but... > Since the function listpop is static, the fastest way to pop a list is > to grab the method named "pop" from the PyTypeObject. This might not be > obvious to everyone, so I think it is an optimization that Cython should do. > -0.5 for the implementation... The hack is cute, but still have to make a listpop(self,args), were you need to create a tuple to make the call... I think Cython could generate a custom, inline function implementing l.pop([n]) and fast dispatch to it if the object is exactly a list ... > > > cdef extern from "Python.h": > > ctypedef object (*PyCFunction)(object self, object args) > > ctypedef struct PyMethodDef: > char *ml_name > PyCFunction ml_meth > > ctypedef struct PyTypeObject: > PyMethodDef *tp_methods > > PyTypeObject PyList_Type > > > cdef PyCFunction __listpop = NULL > > > cpdef listpop(list self, Py_ssize_t n=-1): > > global __listpop > > cdef PyMethodDef *ptr > cdef int ok = 0 > > if not __listpop: > ptr = PyList_Type.tp_methods; > while ptr: > if str(ptr[0].ml_name) == "pop": > __listpop = ptr.ml_meth > ok = 1 > break > ptr += 1 > if not ok: > raise ValueError, \ > 'Could not find method "pop" in list type object' > return __listpop(self, (int(n),)) > > > > ___ > Cython-dev mailing list > Cython-dev@codespeak.net > http://codespeak.net/mailman/listinfo/cython-dev > -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev
[Cython] optimizing list.pop
I just realized the Python does not provide a fast way of popping an item from a list, just append. I.e. there is a PyList_Append, but no PyList_Pop. And in Python's code for _listobject.c, the function listpop is declared static. When using a list as a stack (a common case for Python lists), a fast pop is just as important as a fast append. Since the function listpop is static, the fastest way to pop a list is to grab the method named "pop" from the PyTypeObject. This might not be obvious to everyone, so I think it is an optimization that Cython should do. Sturla Molden cdef extern from "Python.h": ctypedef object (*PyCFunction)(object self, object args) ctypedef struct PyMethodDef: char *ml_name PyCFunction ml_meth ctypedef struct PyTypeObject: PyMethodDef *tp_methods PyTypeObject PyList_Type cdef PyCFunction __listpop = NULL cpdef listpop(list self, Py_ssize_t n=-1): global __listpop cdef PyMethodDef *ptr cdef int ok = 0 if not __listpop: ptr = PyList_Type.tp_methods; while ptr: if str(ptr[0].ml_name) == "pop": __listpop = ptr.ml_meth ok = 1 break ptr += 1 if not ok: raise ValueError, \ 'Could not find method "pop" in list type object' return __listpop(self, (int(n),)) ___ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev