Hi, someone on this list mentioned that much of the s.get() time is spend on the name lookup for get(). That is indeed the case:
=================== from timeit import * stats = ["for i in xrange(1000): iter(s).next() ", "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak", "for i in xrange(1000): s.add(s.pop()) ", "for i in xrange(1000): s.get() ", "g=s.get;\nfor i in xrange(1000): g() "] for stat in stats: t = Timer(stat, setup="s=set(range(1000))") print "Time for %s:\t %f"%(stat, t.timeit(number=1000)) ================== Time for for i in xrange(1000): iter(s).next() : 0.448227 Time for for i in xrange(1000): for x in s: break: 0.141669 Time for for i in xrange(1000): s.add(s.pop()) : 0.348055 Time for for i in xrange(1000): s.get() : 0.148580 Time for g=s.get; for i in xrange(1000): g() : 0.080563 So, now set.get() is indeed the fastest and preferable solution if you need massive amounts of retrieving elements from a set without removing them. wr Am Freitag, 23. Oktober 2009 22:53:24 schrieb Willi Richert: > Hi, > > surprised about the performance of for/break provided by Vitor, I did some > more testing. It revealed that indeed we can forget the get() (which was > implemented as a stripped down pop()): > > from timeit import * > stats = ["for i in xrange(1000): iter(s).next() ", > "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak ", > "for i in xrange(1000): s.add(s.pop()) ", > "for i in xrange(1000): s.get() "] > > for stat in stats: > t = Timer(stat, setup="s=set(range(100))") > try: > print "Time for %s:\t %f"%(stat, t.timeit(number=1000)) > except: > t.print_exc() > > > > $ ./test_get.py > Time for for i in xrange(1000): iter(s).next() : 0.433080 > Time for for i in xrange(1000): > for x in s: > break : 0.148695 > Time for for i in xrange(1000): s.add(s.pop()) : 0.317418 > Time for for i in xrange(1000): s.get() : 0.146673 > > > In some tests, for/break was even slightly faster then get(). > > As always, intuition regarding performance bottlenecks is flawed ;-) > > Anyway, thanks for all the helpful comments, especially to Stefan for the > http://comments.gmane.org/gmane.comp.python.ideas/5606 link. > > Regards, > wr > > Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel: > > Vitor Bosshard wrote: > > > 2009/10/23 Willi Richert <w.rich...@gmx.net>: > > >> Hi, > > >> > > >> recently I wrote an algorithm, in which very often I had to get an > > >> arbitrary element from a set without removing it. > > >> > > >> Three possibilities came to mind: > > >> > > >> 1. > > >> x = some_set.pop() > > >> some_set.add(x) > > >> > > >> 2. > > >> for x in some_set: > > >> break > > >> > > >> 3. > > >> x = iter(some_set).next() > > >> > > >> > > >> Of course, the third should be the fastest. It nevertheless goes > > >> through all the iterator creation stuff, which costs some time. I > > >> wondered, why the builtin set does not provide a more direct and > > >> efficient way for retrieving some element without removing it. Is > > >> there any reason for this? > > >> > > >> I imagine something like > > >> > > >> x = some_set.get() > > > > > > I see this as being useful for frozensets as well, where you can't get > > > an arbitrary element easily due to the obvious lack of .pop(). I ran > > > into this recently, when I had a frozenset that I knew had 1 element > > > (it was the difference between 2 other sets), but couldn't get to that > > > element easily (get the pun?) > > > > So in my testing (2) was actually the fastest. I assumed because .next() > > was a function call overhead, while: > > for x in some_set: > > break > > > > Was evaluated inline. It probably still has to call PyObject_GetIter, > > however it doesn't have to create a stack frame for it. > > > > This is what "timeit" tells me. All runs are of the form: > > python -m timeit -s "s = set([10])" ... > > > > 0.101us "for x in s: break; x" > > 0.130us "for x in s: pass; x" > > 0.234us -s "n = next; i = iter" "x = n(i(s)); x" > > 0.248us "x = next(iter(s)); x" > > 0.341us "x = iter(s).next(); x" > > > > So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x > > faster than (iter(s).next()). > > I was pretty surprised that it was 30% faster than "for x in s: pass". I > > assume it has something to do with a potential "else:" statement? > > > > Note that all of these are significantly < 1us. So this only matters if > > it is something you are doing often. > > > > I don't know your specific timings, but I would guess that: > > for x in s: break > > > > Is actually going to be faster than your > > s.get() > > > > Primarily because s.get() requires an attribute lookup. I would think > > your version might be faster for: > > stat2 = "g = s.get; for i in xrange(100): g()" > > > > However, that is still a function call, which may be treated differently > > by the interpreter than the for:break loop. I certainly suggest you try > > it and compare. > > > > John > > =:-> > _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com