[issue18962] Add special case for single iterator in heapq.merge function

2013-09-11 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 0e70bf1f32a3 by Raymond Hettinger in branch 'default':
Issue #18962:  Optimize the single iterator case for  heapq.merge()
http://hg.python.org/cpython/rev/0e70bf1f32a3

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-11 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-11 Thread Wouter Bolsterlee

Wouter Bolsterlee added the comment:

Thanks for the quick response. 

Btw, do I understand correctly code cleanups are not welcome, even when 
touching the code anyway?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-11 Thread Wouter Bolsterlee

Wouter Bolsterlee added the comment:

(In case you missed it: my latest comment included a cleaned up version of an 
earlier patch.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-10 Thread Wouter Bolsterlee

Wouter Bolsterlee added the comment:

Thanks Raymond, that is exactly what I had in mind (see my previous comment).
Here's a slightly cleaned up version of the patch (stylistic/PEP8 cleanups),
with some benchmarks included below.

In case the two longest iterators have about the same size, no performance
difference can be measured:

$ ./python -m timeit -s 'from heapq import merge' 'for x in merge([], [], [1], 
range(100), range(100)): pass'

Without patch:

1 loops, best of 3: 71.2 usec per loop
1 loops, best of 3: 71.9 usec per loop
1 loops, best of 3: 71.7 usec per loop

With patch:

1 loops, best of 3: 71.4 usec per loop
1 loops, best of 3: 76.7 usec per loop
1 loops, best of 3: 72.1 usec per loop

As expected, the performance gain is very significant in case one of the
iterators is much longer than the others:

$ python -m timeit -n 100 -s 'from heapq import merge' 'for x in merge([], [], 
[1], range(100), range(10)): pass'

Without path:

100 loops, best of 3: 27.8 msec per loop
100 loops, best of 3: 26.9 msec per loop
100 loops, best of 3: 27.7 msec per loop

With patch:

100 loops, best of 3: 6.26 msec per loop
100 loops, best of 3: 6.28 msec per loop
100 loops, best of 3: 6.03 msec per loop

--
Added file: http://bugs.python.org/file31726/merge3.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-08 Thread Wouter Bolsterlee

Wouter Bolsterlee added the comment:

An additional speedup would be to add a if len(h) == 1 check inside the while 
loop, and just yield from the remaining iterator if a single iterable remains. 
This would also speed up merges with multiple inputs, as it doesn't do the 
whole heapreplace() loop for the last remaining iterable. Example: merge([], 
[], [], range(10). This would involve some more refactoring inside the 
function though, since the current implementation only stores the .next() 
function for each iterable.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-08 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Try this patch.

--
Added file: http://bugs.python.org/file31671/merge.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-08 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


Added file: http://bugs.python.org/file31681/merge2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-08 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


Removed file: http://bugs.python.org/file31671/merge.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-07 Thread Wouter Bolsterlee

New submission from Wouter Bolsterlee:

The heapq.merge() function merges multiple sorted iterables into a single 
sorted output.  The function uses a heap queue that is repeatedly looped over 
until it has generated all output.

If only a single iterable is passed to heapq.merge(), the heap will have len(h) 
== 1, which means the looping and heap manipulation just degrades to a 
convoluted yield from iterables[0]. This negatively impacts performance. The 
attached patch adds a short-circuit for this single input case.

The behaviour of the function remains unchanged:

 list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
 list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]

For multiple inputs, there is no measurable performance difference (measured
using IPython's %timeit). Without patch:

 %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
10 loops, best of 3: 13.7 µs per loop
 %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
10 loops, best of 3: 13.4 µs per loop
 %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
10 loops, best of 3: 13.5 µs per loop

With patch:

 %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
10 loops, best of 3: 13.7 µs per loop
 %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
10 loops, best of 3: 13.6 µs per loop
 %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
10 loops, best of 3: 13.6 µs per loop
 %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
10 loops, best of 3: 13.8 µs per loop

The real performance gain is of course when only a single iterable is passed. 
Without patch:

 %timeit for x in merge_before(range(1)): pass
100 loops, best of 3: 2.65 ms per loop
 %timeit for x in merge_before(range(1)): pass
100 loops, best of 3: 2.52 ms per loop
 %timeit for x in merge_before(range(1)): pass
100 loops, best of 3: 2.51 ms per loop

With patch:

 %timeit for x in merge_after(range(1)): pass
1000 loops, best of 3: 604 µs per loop
 %timeit for x in merge_after(range(1)): pass
1000 loops, best of 3: 609 µs per loop
 %timeit for x in merge_after(range(1)): pass
1000 loops, best of 3: 603 µs per loop

This is a ~4x speedup for an input consisting of 1 items. Compared to plain 
iteration:

 %timeit for x in range(1): pass
1000 loops, best of 3: 263 µs per loop
 %timeit for x in range(1): pass
1000 loops, best of 3: 268 µs per loop
 %timeit for x in range(1): pass
1000 loops, best of 3: 273 µs per loop

Without the patch, heapq.merge() is ~10x as slow as plain iteration. With the 
patch, this is reduced to ~2.5x.

(Note: all testing done using Python 3.3.2 on a 64bit Linux machine.)

--
components: Library (Lib)
files: heapq-merge-single-input-optimization.patch
keywords: patch
messages: 197174
nosy: wbolster
priority: normal
severity: normal
status: open
title: Add special case for single iterator in heapq.merge function
type: performance
versions: Python 3.3
Added file: 
http://bugs.python.org/file31649/heapq-merge-single-input-optimization.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-07 Thread Benjamin Peterson

Changes by Benjamin Peterson benja...@python.org:


--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18962] Add special case for single iterator in heapq.merge function

2013-09-07 Thread Raymond Hettinger

Raymond Hettinger added the comment:

At first glance, this looks reasonable.  I'll give it a more thorough look 
shortly.

--
assignee:  - rhettinger
versions: +Python 3.4 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18962
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com