Bugs item #1085744, was opened at 2004-12-15 22:15 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1085744&group_id=5470
Category: Python Interpreter Core Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Nick Coghlan (ncoghlan) Assigned to: Nobody/Anonymous (nobody) Summary: Bad interaction between PySequence_Fast and itertools Initial Comment: A little "my code is faster than your code" game on python-list showed up a pathological interaction between itertools.chain and str.join: C:\>python -m timeit -s "data = [map(str, range(x)) for x in range(1000)]; from itertools import chain" "''.join(chain(*data))" 10 loops, best of 3: 1.2 sec per loop ****** OUCH!! Some extra experiments show that creating a list from the result of chain is fast, but creating a tuple is horrendously slow: C:\>python -m timeit -s "data = [map(str, range(x)) for x in range(1000)]; from itertools import chain" "''.join(list(chain(*data)))" 10 loops, best of 3: 107 msec per loop C:\>python -m timeit -s "data = [map(str, range(x)) for x in range(1000)]; from itertools import chain" "''.join(tuple(chain(*data)))" 10 loops, best of 3: 1.2 sec per loop Creating the tuple by means of a list is actually faster than creating the tuple directly: C:\>python -m timeit -s "data = [map(str, range(x)) for x in range(1000)]; from itertools import chain" "''.join(tuple(list(chain(*data))))" 10 loops, best of 3: 146 msec per loop A check with imap suggests the problem may be a more general interaction between PySequence_Fast and iterators which don't have __len__ methods: C:\>python -m timeit -s "data = [y for x in range(1000) for y in map(str, range(x))]; from itertools import imap; val = lambda arg: arg" "''.join(list(imap(val, data)))" 10 loops, best of 3: 310 msec per loop C:\>python -m timeit -s "data = [y for x in range(1000) for y in map(str, range(x))]; from itertools import imap; val = lambda arg: arg" "''.join(imap(val, data))" 10 loops, best of 3: 1.4 sec per loop Looking at the code supports that, too - PySequence_Fast uses PySequence_Tuple, which is great when PyObject_Size gives a nice answer, but can lead to a lot of tuple resizing when it isn't (one resize per 10 items beyond 10 up to 500, then one resize per 100 items beyond 500). 2.4's optimised list extension means that this *really* hurts performance wise. The other aspect is whether or not some of the utilities in itertools could benefit from a __len__ method that returned a sensible answer if their inputs defined __len__ methods, and returned -1 otherwise (this would certainly work well with PySequence_Tuple, but I realise it's a slightly odd behaviour for a __len__ method). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1085744&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com