New submission from Joshua Bronson <jabron...@gmail.com>:

>From http://docs.python.org/library/heapq.html:
> The latter two functions (nlargest and nsmallest) perform best for
> smaller values of n. For larger values, it is more efficient to use
> the sorted() function. Also, when n==1, it is more efficient to use
> the builtin min() and max() functions.

A quick usability win for the heapq module: Be more specific than "smaller" and 
"larger", e.g. "when n is O(...(len(iterable)))".

>From http://groups.google.com/group/comp.lang.python/msg/9556f56ae25ecf3b:
> I find it interesting that the heapq functions tell you in the
> documentation that they aren't suitable for use where n==1 or where
> n is near the total size of the sequence whereas random.sample()
> chooses what it thinks is the best algorithm based on the input. At
> the very least I would have thought the heapq functions could check
> for n==1 and call min/max when it is.

+1. It looks like the pure Python implementation of nsmallest is actually 
already 
choosing either an insort algorithm or a minheap algorithm based on whether n * 
10 <= 
len(iterable), but the C implementation of nsmallest unconditionally uses a 
minheap 
algorithm. Neither the pure Python nor the C implementation of nlargest chooses 
its 
algorithm conditionally. For the sake of consistency and usability, all 
implementations of nsmallest and nlargest should choose the most efficient 
algorithm 
from among min/max, insort, and minheap, and the docs should be updated to 
describe 
this behavior.

Also, in looking through the heapq.py that came with my Python 2.6.2 
distribution, I 
noticed that line 196 seems to have no reason to be there:
    _heappushpop = heappushpop                                      
This is the only occurrence of "_heappushpop" in the file.

I made a patch for heapq.py, but since I'm not a CPython hacker, I thought it 
might 
be better to leave the changes up to someone who could do both the pure Python 
and 
the C implementations in tandem. I'd be happy to contribute documentation when 
the 
time comes if that would help.

----------
messages: 91134
nosy: jab
severity: normal
status: open
title: heapq.nsmallest and nlargest should be smarter/more usable/more 
consistent
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6614>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to