New submission from Joshua Bronson <jabron...@gmail.com>: >From http://docs.python.org/library/heapq.html: > The latter two functions (nlargest and nsmallest) perform best for > smaller values of n. For larger values, it is more efficient to use > the sorted() function. Also, when n==1, it is more efficient to use > the builtin min() and max() functions.
A quick usability win for the heapq module: Be more specific than "smaller" and "larger", e.g. "when n is O(...(len(iterable)))". >From http://groups.google.com/group/comp.lang.python/msg/9556f56ae25ecf3b: > I find it interesting that the heapq functions tell you in the > documentation that they aren't suitable for use where n==1 or where > n is near the total size of the sequence whereas random.sample() > chooses what it thinks is the best algorithm based on the input. At > the very least I would have thought the heapq functions could check > for n==1 and call min/max when it is. +1. It looks like the pure Python implementation of nsmallest is actually already choosing either an insort algorithm or a minheap algorithm based on whether n * 10 <= len(iterable), but the C implementation of nsmallest unconditionally uses a minheap algorithm. Neither the pure Python nor the C implementation of nlargest chooses its algorithm conditionally. For the sake of consistency and usability, all implementations of nsmallest and nlargest should choose the most efficient algorithm from among min/max, insort, and minheap, and the docs should be updated to describe this behavior. Also, in looking through the heapq.py that came with my Python 2.6.2 distribution, I noticed that line 196 seems to have no reason to be there: _heappushpop = heappushpop This is the only occurrence of "_heappushpop" in the file. I made a patch for heapq.py, but since I'm not a CPython hacker, I thought it might be better to leave the changes up to someone who could do both the pure Python and the C implementations in tandem. I'd be happy to contribute documentation when the time comes if that would help. ---------- messages: 91134 nosy: jab severity: normal status: open title: heapq.nsmallest and nlargest should be smarter/more usable/more consistent type: behavior versions: Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6614> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com