On Wed, Oct 19, 2011 at 11:52 PM, Simon King <[email protected]> wrote:
> Hi Robert,
>
> On 20 Okt., 06:59, Robert Bradshaw <[email protected]>
> wrote:
>> I have no idea why L.pop(0) is not optimized, but L.pop(<int>0) is
>> (and the latter should be very fast). Clearly a bug in Cython, but
>> user the latter.
>
> Surprise: Even though the annotated code for L.pop(<int>0) is less
> yellow than for L.pop(0), it is *slower* by about 30%!
>
> In flask.sagenb, I get:
> {{{
> %cython
> def f(list L):
>    L.append(1)
>    return L.pop(0),L
> def g(list L):
>    L.append(1)
>    return L.pop(<int>0),L
> def h(list L):
>    L.append(1)
>    return L.pop(),L
> }}}
> {{{
> L = range(1000)
> }}}
>
> timeit("f(L)")  --> 625 loops, best of 3: 701 ns per loop
> timeit("g(L)") --> 625 loops, best of 3: 971 ns per loop
> timeit("h(L)") --> 625 loops, best of 3: 131 ns per loop

Very interesting--looks like the crossover point is around 200
elements. Looks like Python's native pop uses memmove rather than
doing a loop shift the array which could account for the difference.

> Since L.pop() (popping the last item) is so much faster than L.pop(0),
> I could perhaps just revert my lists.

Inserting into the 0th element will have the same issues as popping it.

>> > Nevertheless, I'd still appreciate to learn a Cython replacement for
>> > pop(0) that is fast on long lists as well.
>>
>> There isn't one as pop(0) on a list requires re-copying the entire
>> contents of the list.
>
> Really? I'm surprised that an in-place change requires copying.

It's an array, not a linked list. The k-th item of a list occupies a
fixed location in memory relative to the head of the object, so
removing an item requires shifting everything. (Not the data, just the
pointers.)

>> You could use numpy arrays, where slices such as
>> L[1:] are views. A cython-implemented linked list may perform well
>> here as well.
>
> I've  never used numpy before, but I'll give it a try.
>
> Some tests in pure Python seem to indicate that numpy arrays are
> slower than lists.
> In particular, I need containment test and pop (or slices), and the
> items are types:
>
> sage: import numpy
> sage: from numpy import array
> sage: A = CommutativeAlgebras(QQ); B = HopfAlgebras(QQ); C = Fields();
> D = FiniteEnumeratedSets()
> sage: L1=A.parent_class.mro()
> sage: L2=B.parent_class.mro()
> sage: L3=C.parent_class.mro()
> sage: L4=D.parent_class.mro()
> sage: A1 = array(L1)
> sage: A2 = array(L2)
> sage: A3 = array(L3)
> sage: A4 = array(L4)
> sage: c = Algebras(QQ).parent_class
> sage: timeit("c in L2")
> 625 loops, best of 3: 344 ns per loop
> sage: timeit("c in A2")
> 625 loops, best of 3: 14 µs per loop
> sage: timeit("c in L4")
> 625 loops, best of 3: 555 ns per loop
> sage: timeit("c in A4")
> 625 loops, best of 3: 12.9 µs per loop
> sage: timeit("L4[1:]")
> 625 loops, best of 3: 835 ns per loop
> sage: timeit("A4[1:]")
> 625 loops, best of 3: 1.07 µs per loop

As numpy slices are views, they are a bit more expensive but require
O(1) time, whereas slicing a list is O(n) time.

sage: from numpy import array
sage: for k in [2..5]:
...       L = range(10^k)
...       N = array(L)
...       timeit("L[1:]")
...       timeit("N[1:]")
625 loops, best of 3: 656 ns per loop
625 loops, best of 3: 576 ns per loop
625 loops, best of 3: 3.42 µs per loop
625 loops, best of 3: 627 ns per loop
625 loops, best of 3: 45.5 µs per loop
625 loops, best of 3: 578 ns per loop
625 loops, best of 3: 959 µs per loop
625 loops, best of 3: 630 ns per loop

Looks like your arrays are pretty small though.

> Will the picture be different when cimporting numpy.ndarray in Cython?

Likely not. The primary benefits (100x or more speedups) are when
indexing and using native C types. Why not write your own simple
queue?

cdef class Queue:
    cdef list elements
    cdef int start
    cdef int end
    cdef int capacity
    def __init__(self, initial_capacity=5):
        self.elements = [None] * initial_capacity
        self.start = 0
        self.end = 0
        self.capacity = initial_capacity
    cpdef push(self, o):
        self.elements[self.end] = o
        self.end = (self.end + 1) % self.capacity
        if self.end == self.start:
            self.elements = self.elements[self.start:] +
self.elements[:self.end] + [None] * self.capacity
            self.start = 0
            self.end = self.capacity
            self.capacity *= 2
    cpdef pop(self):
        if self.start == self.end:
            raise ValueError, "Empty queue"
        try:
            return self.elements[self.start]
        finally:
            self.elements[self.start] = None
            self.start = (self.start + 1) % self.capacity

def f(Queue q):
   q.push(1)
   return q.pop(), q

q = Queue()
for k in range(1000):
    q.push(k)
timeit("f(q)")

625 loops, best of 3: 147 ns per loop


- Robert

-- 
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org

Reply via email to