Bryan, 12.05.2010 08:55:
Now back to the arguably-interesting issue of speed in the particular
problem here: 'Superpollo' had suggested another variant, which I
appended to my timeit targets, resulting in:

[s for s in strs if s.startswith('a')]  took:  5.68393977159
[s for s in strs if s[:1] == 'a']  took:  3.31676491502
[s for s in strs if s and s[0] == 'a']  took:  2.29392950076

Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
three.

Just out of curiosity, I ran the same code in the latest Cython pre-0.13 and added some optimised Cython implementations. Here's the code:

def cython_way0(l):
    return [ s for s in l if s.startswith(u'a') ]

def cython_way1(list l):
    cdef unicode s
    return [ s for s in l if s.startswith(u'a') ]

def cython_way2(list l):
    cdef unicode s
    return [ s for s in l if s[:1] == u'a' ]

def cython_way3(list l):
    cdef unicode s
    return [ s for s in l if s[0] == u'a' ]

def cython_way4(list l):
    cdef unicode s
    return [ s for s in l if s and s[0] == u'a' ]

def cython_way5(list l):
    cdef unicode s
    return [ s for s in l if (<Py_UNICODE>s[0]) == u'a' ]

def cython_way6(list l):
    cdef unicode s
    return [ s for s in l if s and (<Py_UNICODE>s[0]) == u'a' ]


And here are the numbers (plain Python 2.6.5 first):

[s for s in strs if s.startswith(u'a')] took: 1.04618620872
[s for s in strs if s[:1] == u'a'] took: 0.518909931183
[s for s in strs if s and s[0] == u'a'] took: 0.617404937744

cython_way0(strs) took: 0.769457817078
cython_way1(strs) took: 0.0861849784851
cython_way2(strs) took: 0.208586931229
cython_way3(strs) took: 0.18615603447
cython_way4(strs) took: 0.190477132797
cython_way5(strs) took: 0.0366449356079
cython_way6(strs) took: 0.0368368625641

Personally, I think the cast to Py_UNICODE in the last two implementations shouldn't be required, that should happen automatically, so that way3/4 runs equally fast as way5/6. I'll add that when I get to it.

Note that unicode.startswith() is optimised in Cython, so it's a pretty fast option, too. Also note that the best speed-up here is only a factor of 14, so plain Python is quite competitive, unless the list is huge and this is really a bottleneck in an application.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to