On 4/6/2011 2:54 PM, Terry Reedy wrote:

I believe that at the time of that decision, the Python [heapq] code was only
intended for humans, like the Python (near) equivalents in the itertools
docs to C-coded itertool functions. Now that we are aiming to have
stdlib Python code be a reference implementation for all interpreters,
that decision should be revisited.

OK so far.

> Either the C code should be generalized to sequences or
> the Python code specialized to lists, making sure the doc matches either way.

After rereading the heapq doc and .py file and thinking some more, I retract this statement for the following reasons.

1. The heapq doc clearly states that a list is required. It leaves the behavior for other types undefined. Let it be so.

2. Both _heapq.c (or its actual name) and heapq.py meet (I presume) the documented requirements and pass (or would pass) a complete test suite based on using lists as heaps. In that regard, both are conformant and should be considered 'equivalent'.

3. _heapq.c is clearly optimized for speed. It allows a list subclass as input and will heapify such, but it ignores a custom __getitem__. My informal test on the result of random.shuffle(list(range(9999999) shows that heapify is over 10x as fast as .sort(). Let it be so.

4. When I suggested changing heapq.py, I had forgetten that heap.py defined several functions rather than a wrapper class with methods. I was thinking of putting a type check in .__init__, where it would be applied once per heap (and possibly bypassed), and could easily be removed. Instead every function would require a type check for every call. This would be too obnoxious to me. I love duck typing and held my nose a bit when suggesting a one-time type check.

5. Python already has an "extra's allowed" principle. In other words, an implementation does not have to bother to enforce documented restrictions. For one example, Python 2 manuals restrict identifiers to ascii letters. CPython 2 (at least in recent versions) actually allows extended ascii letters, as in latin-1. For another, namespaces (globals and attribute namespaces), by their name, only need to map identifiers to objects. However, CPython uses general dicts rather than specialized string dicts with validity checks. People have exploited both loopholes. But those who have should not complain to us if such code fails on a different implementation that adheres to the doc.

I think the Language and Library references should start with something a bit more specific than at present:

"The Python x.y Language and Library References define the Python x.y language, its builtin objects, and standard library. Code written to these docs should run on any implementation that includes the features used. Code that exploits or depends on any implementation-specific feature or behavior may not be portable."

_x.c and x.py are separate implementations of module x. I think they should be subject to the same disclaimer.


Therefore, I currently think that the only change needed for heapq (assuming both versions pass complete tests as per the doc) is an explanation at the top of heapq.py that goes something like this:

"Heapq.py is a reference implementation of the heapq module for both humans and implementations that do not have an accelerated version. For CPython, most of the functions are replaced by much faster C-coded versions.

Heapq is documented to required a python list as input to the heap functions. The C functions enforce this restriction. The Python versions do not and should work with any mutable random-access sequence. Should you wish to run the Python code with CPython, copy this file, give it a new name, delete the following lines:

try:
    from _heapq import *
except ImportError:
    pass

make any other changes you wish, and do not expect the result to be portable."

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to