[issue18962] Add special case for single iterator in heapq.merge function
Roundup Robot added the comment: New changeset 0e70bf1f32a3 by Raymond Hettinger in branch 'default': Issue #18962: Optimize the single iterator case for heapq.merge() http://hg.python.org/cpython/rev/0e70bf1f32a3 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Wouter Bolsterlee added the comment: Thanks for the quick response. Btw, do I understand correctly code cleanups are not welcome, even when touching the code anyway? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Wouter Bolsterlee added the comment: (In case you missed it: my latest comment included a cleaned up version of an earlier patch.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Wouter Bolsterlee added the comment: Thanks Raymond, that is exactly what I had in mind (see my previous comment). Here's a slightly cleaned up version of the patch (stylistic/PEP8 cleanups), with some benchmarks included below. In case the two longest iterators have about the same size, no performance difference can be measured: $ ./python -m timeit -s 'from heapq import merge' 'for x in merge([], [], [1], range(100), range(100)): pass' Without patch: 1 loops, best of 3: 71.2 usec per loop 1 loops, best of 3: 71.9 usec per loop 1 loops, best of 3: 71.7 usec per loop With patch: 1 loops, best of 3: 71.4 usec per loop 1 loops, best of 3: 76.7 usec per loop 1 loops, best of 3: 72.1 usec per loop As expected, the performance gain is very significant in case one of the iterators is much longer than the others: $ python -m timeit -n 100 -s 'from heapq import merge' 'for x in merge([], [], [1], range(100), range(10)): pass' Without path: 100 loops, best of 3: 27.8 msec per loop 100 loops, best of 3: 26.9 msec per loop 100 loops, best of 3: 27.7 msec per loop With patch: 100 loops, best of 3: 6.26 msec per loop 100 loops, best of 3: 6.28 msec per loop 100 loops, best of 3: 6.03 msec per loop -- Added file: http://bugs.python.org/file31726/merge3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Wouter Bolsterlee added the comment: An additional speedup would be to add a if len(h) == 1 check inside the while loop, and just yield from the remaining iterator if a single iterable remains. This would also speed up merges with multiple inputs, as it doesn't do the whole heapreplace() loop for the last remaining iterable. Example: merge([], [], [], range(10). This would involve some more refactoring inside the function though, since the current implementation only stores the .next() function for each iterable. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Raymond Hettinger added the comment: Try this patch. -- Added file: http://bugs.python.org/file31671/merge.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Changes by Raymond Hettinger raymond.hettin...@gmail.com: Added file: http://bugs.python.org/file31681/merge2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Changes by Raymond Hettinger raymond.hettin...@gmail.com: Removed file: http://bugs.python.org/file31671/merge.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
New submission from Wouter Bolsterlee: The heapq.merge() function merges multiple sorted iterables into a single sorted output. The function uses a heap queue that is repeatedly looped over until it has generated all output. If only a single iterable is passed to heapq.merge(), the heap will have len(h) == 1, which means the looping and heap manipulation just degrades to a convoluted yield from iterables[0]. This negatively impacts performance. The attached patch adds a short-circuit for this single input case. The behaviour of the function remains unchanged: list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25] list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25] For multiple inputs, there is no measurable performance difference (measured using IPython's %timeit). Without patch: %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 10 loops, best of 3: 13.7 µs per loop %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 10 loops, best of 3: 13.4 µs per loop %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 10 loops, best of 3: 13.5 µs per loop With patch: %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 10 loops, best of 3: 13.7 µs per loop %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 10 loops, best of 3: 13.6 µs per loop %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 10 loops, best of 3: 13.6 µs per loop %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25])) 10 loops, best of 3: 13.8 µs per loop The real performance gain is of course when only a single iterable is passed. Without patch: %timeit for x in merge_before(range(1)): pass 100 loops, best of 3: 2.65 ms per loop %timeit for x in merge_before(range(1)): pass 100 loops, best of 3: 2.52 ms per loop %timeit for x in merge_before(range(1)): pass 100 loops, best of 3: 2.51 ms per loop With patch: %timeit for x in merge_after(range(1)): pass 1000 loops, best of 3: 604 µs per loop %timeit for x in merge_after(range(1)): pass 1000 loops, best of 3: 609 µs per loop %timeit for x in merge_after(range(1)): pass 1000 loops, best of 3: 603 µs per loop This is a ~4x speedup for an input consisting of 1 items. Compared to plain iteration: %timeit for x in range(1): pass 1000 loops, best of 3: 263 µs per loop %timeit for x in range(1): pass 1000 loops, best of 3: 268 µs per loop %timeit for x in range(1): pass 1000 loops, best of 3: 273 µs per loop Without the patch, heapq.merge() is ~10x as slow as plain iteration. With the patch, this is reduced to ~2.5x. (Note: all testing done using Python 3.3.2 on a 64bit Linux machine.) -- components: Library (Lib) files: heapq-merge-single-input-optimization.patch keywords: patch messages: 197174 nosy: wbolster priority: normal severity: normal status: open title: Add special case for single iterator in heapq.merge function type: performance versions: Python 3.3 Added file: http://bugs.python.org/file31649/heapq-merge-single-input-optimization.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Changes by Benjamin Peterson benja...@python.org: -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18962] Add special case for single iterator in heapq.merge function
Raymond Hettinger added the comment: At first glance, this looks reasonable. I'll give it a more thorough look shortly. -- assignee: - rhettinger versions: +Python 3.4 -Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18962 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com