On 4/6/2011 2:54 PM, Terry Reedy wrote:
I believe that at the time of that decision, the Python [heapq] code was only
intended for humans, like the Python (near) equivalents in the itertools
docs to C-coded itertool functions. Now that we are aiming to have
stdlib Python code be a reference implementation for all interpreters,
that decision should be revisited.
OK so far.
> Either the C code should be generalized to sequences or
> the Python code specialized to lists, making sure the doc matches
either way.
After rereading the heapq doc and .py file and thinking some more, I
retract this statement for the following reasons.
1. The heapq doc clearly states that a list is required. It leaves the
behavior for other types undefined. Let it be so.
2. Both _heapq.c (or its actual name) and heapq.py meet (I presume) the
documented requirements and pass (or would pass) a complete test suite
based on using lists as heaps. In that regard, both are conformant and
should be considered 'equivalent'.
3. _heapq.c is clearly optimized for speed. It allows a list subclass as
input and will heapify such, but it ignores a custom __getitem__. My
informal test on the result of random.shuffle(list(range(9999999) shows
that heapify is over 10x as fast as .sort(). Let it be so.
4. When I suggested changing heapq.py, I had forgetten that heap.py
defined several functions rather than a wrapper class with methods. I
was thinking of putting a type check in .__init__, where it would be
applied once per heap (and possibly bypassed), and could easily be
removed. Instead every function would require a type check for every
call. This would be too obnoxious to me. I love duck typing and held my
nose a bit when suggesting a one-time type check.
5. Python already has an "extra's allowed" principle. In other words, an
implementation does not have to bother to enforce documented
restrictions. For one example, Python 2 manuals restrict identifiers to
ascii letters. CPython 2 (at least in recent versions) actually allows
extended ascii letters, as in latin-1. For another, namespaces (globals
and attribute namespaces), by their name, only need to map identifiers
to objects. However, CPython uses general dicts rather than specialized
string dicts with validity checks. People have exploited both loopholes.
But those who have should not complain to us if such code fails on a
different implementation that adheres to the doc.
I think the Language and Library references should start with something
a bit more specific than at present:
"The Python x.y Language and Library References define the Python x.y
language, its builtin objects, and standard library. Code written to
these docs should run on any implementation that includes the features
used. Code that exploits or depends on any implementation-specific
feature or behavior may not be portable."
_x.c and x.py are separate implementations of module x. I think they
should be subject to the same disclaimer.
Therefore, I currently think that the only change needed for heapq
(assuming both versions pass complete tests as per the doc) is an
explanation at the top of heapq.py that goes something like this:
"Heapq.py is a reference implementation of the heapq module for both
humans and implementations that do not have an accelerated version. For
CPython, most of the functions are replaced by much faster C-coded versions.
Heapq is documented to required a python list as input to the heap
functions. The C functions enforce this restriction. The Python versions
do not and should work with any mutable random-access sequence. Should
you wish to run the Python code with CPython, copy this file, give it a
new name, delete the following lines:
try:
from _heapq import *
except ImportError:
pass
make any other changes you wish, and do not expect the result to be
portable."
--
Terry Jan Reedy
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com