On Wed, Oct 19, 2011 at 11:52 PM, Simon King <[email protected]> wrote:
> Hi Robert,
>
> On 20 Okt., 06:59, Robert Bradshaw <[email protected]>
> wrote:
>> I have no idea why L.pop(0) is not optimized, but L.pop(<int>0) is
>> (and the latter should be very fast). Clearly a bug in Cython, but
>> user the latter.
>
> Surprise: Even though the annotated code for L.pop(<int>0) is less
> yellow than for L.pop(0), it is *slower* by about 30%!
>
> In flask.sagenb, I get:
> {{{
> %cython
> def f(list L):
> L.append(1)
> return L.pop(0),L
> def g(list L):
> L.append(1)
> return L.pop(<int>0),L
> def h(list L):
> L.append(1)
> return L.pop(),L
> }}}
> {{{
> L = range(1000)
> }}}
>
> timeit("f(L)") --> 625 loops, best of 3: 701 ns per loop
> timeit("g(L)") --> 625 loops, best of 3: 971 ns per loop
> timeit("h(L)") --> 625 loops, best of 3: 131 ns per loop
Very interesting--looks like the crossover point is around 200
elements. Looks like Python's native pop uses memmove rather than
doing a loop shift the array which could account for the difference.
> Since L.pop() (popping the last item) is so much faster than L.pop(0),
> I could perhaps just revert my lists.
Inserting into the 0th element will have the same issues as popping it.
>> > Nevertheless, I'd still appreciate to learn a Cython replacement for
>> > pop(0) that is fast on long lists as well.
>>
>> There isn't one as pop(0) on a list requires re-copying the entire
>> contents of the list.
>
> Really? I'm surprised that an in-place change requires copying.
It's an array, not a linked list. The k-th item of a list occupies a
fixed location in memory relative to the head of the object, so
removing an item requires shifting everything. (Not the data, just the
pointers.)
>> You could use numpy arrays, where slices such as
>> L[1:] are views. A cython-implemented linked list may perform well
>> here as well.
>
> I've never used numpy before, but I'll give it a try.
>
> Some tests in pure Python seem to indicate that numpy arrays are
> slower than lists.
> In particular, I need containment test and pop (or slices), and the
> items are types:
>
> sage: import numpy
> sage: from numpy import array
> sage: A = CommutativeAlgebras(QQ); B = HopfAlgebras(QQ); C = Fields();
> D = FiniteEnumeratedSets()
> sage: L1=A.parent_class.mro()
> sage: L2=B.parent_class.mro()
> sage: L3=C.parent_class.mro()
> sage: L4=D.parent_class.mro()
> sage: A1 = array(L1)
> sage: A2 = array(L2)
> sage: A3 = array(L3)
> sage: A4 = array(L4)
> sage: c = Algebras(QQ).parent_class
> sage: timeit("c in L2")
> 625 loops, best of 3: 344 ns per loop
> sage: timeit("c in A2")
> 625 loops, best of 3: 14 µs per loop
> sage: timeit("c in L4")
> 625 loops, best of 3: 555 ns per loop
> sage: timeit("c in A4")
> 625 loops, best of 3: 12.9 µs per loop
> sage: timeit("L4[1:]")
> 625 loops, best of 3: 835 ns per loop
> sage: timeit("A4[1:]")
> 625 loops, best of 3: 1.07 µs per loop
As numpy slices are views, they are a bit more expensive but require
O(1) time, whereas slicing a list is O(n) time.
sage: from numpy import array
sage: for k in [2..5]:
... L = range(10^k)
... N = array(L)
... timeit("L[1:]")
... timeit("N[1:]")
625 loops, best of 3: 656 ns per loop
625 loops, best of 3: 576 ns per loop
625 loops, best of 3: 3.42 µs per loop
625 loops, best of 3: 627 ns per loop
625 loops, best of 3: 45.5 µs per loop
625 loops, best of 3: 578 ns per loop
625 loops, best of 3: 959 µs per loop
625 loops, best of 3: 630 ns per loop
Looks like your arrays are pretty small though.
> Will the picture be different when cimporting numpy.ndarray in Cython?
Likely not. The primary benefits (100x or more speedups) are when
indexing and using native C types. Why not write your own simple
queue?
cdef class Queue:
cdef list elements
cdef int start
cdef int end
cdef int capacity
def __init__(self, initial_capacity=5):
self.elements = [None] * initial_capacity
self.start = 0
self.end = 0
self.capacity = initial_capacity
cpdef push(self, o):
self.elements[self.end] = o
self.end = (self.end + 1) % self.capacity
if self.end == self.start:
self.elements = self.elements[self.start:] +
self.elements[:self.end] + [None] * self.capacity
self.start = 0
self.end = self.capacity
self.capacity *= 2
cpdef pop(self):
if self.start == self.end:
raise ValueError, "Empty queue"
try:
return self.elements[self.start]
finally:
self.elements[self.start] = None
self.start = (self.start + 1) % self.capacity
def f(Queue q):
q.push(1)
return q.pop(), q
q = Queue()
for k in range(1000):
q.push(k)
timeit("f(q)")
625 loops, best of 3: 147 ns per loop
- Robert
--
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org