Raymond Hettinger wrote:
* It will assist pypy style projects and other python implementations
when they have to build equivalents to CPython.

* Will eliminate confusion about what functions were exactly intended to
do.

* Will confer benefits similar to test driven development where the
documentation and  pure python version are developed first and doctests
gotten to pass, then the C version is created to match.

I haven't seen anyone comment about this assertion of "equivalence".
Doesn't it strike you as difficult to maintain *two* versions of every
function, and ensure they match *exactly*?

Glad you brought this up.  My idea is to present rough equivalence
in unoptimized python that is simple and clear.  The goal is to provide
better documentation where code is more precise than English prose.
That being said, some subset of the existing tests should be runnable
against the rough equivalent and the python code should incorporate doctests.
Running both sets of test should suffice to maintain the rough equivalence.

The notion of exact equivalence should be left to PyPy folks who can attest
that the code can get convoluted when you try to simulate exactly when
error checking is performed, read-only behavior for attributes, and making
the stacktraces look the same when there are errors.  In contrast, my
goal is an approximation that is executable but highly readable and expository.

My thought is to do this only with tools where it really does enhance the
documentation.  The exercise is worthwhile in and of itself.  For example,
I'm working on a pure python version of str.split() and quickly determined
that the docs are *still* in error even after many revisions over the years
(the whitespace version does not, in fact, start by stripping whitespace
from both ends).  Here's what I have so far:

def split(s, sep=None, maxsplit=-1):
   """split(S, [sep [,maxsplit]]) -> list of strings

   Return a list of the words in the string S, using sep as the
   delimiter string.  If maxsplit is given, at most maxsplit
   splits are done. If sep is not specified or is None, any
   whitespace string is a separator and empty strings are removed
   from the result.

   >>> from itertools import product
   >>> s = ' 11   2  333  4  '
   >>> split(s, None)
   ['11', '2', '333', '4']
   >>> n = 8
   >>> for s in product('ab ', repeat=n):
   ...     for maxsplit in range(-2, len(s)+2):
   ...         s = ''.join(s)
... assert s.split(None, maxsplit) == split(s, None, maxsplit), namedtuple('Err', 'str maxsplit result target')(repr(s), maxsplit, split(s,None,maxsplit), s.split(None, maxsplit))

   """
   result = []
   spmode = True
   start = 0
   if maxsplit != 0:
       for i, c in enumerate(s):
           if spmode:
               if not c.isspace():
                   start = i
                   spmode = False
           elif c.isspace():
               result.append(s[start:i])
               start = i
               spmode = True
               if len(result) == maxsplit:
                   break
   rest = s[start:].lstrip()
   return (result + [rest]) if rest else result

Once I have the cleanest possible, self-explantory code that passes tests, I'll improve the variable names and make a more sensible docstring with readable examples. Surprisingly, it hasn't been a trivial exercise to come-up with an equivalent that corresponds more closely to the way we think instead of corresponding the C code -- I want to show *what* is does more than *how* it does it.


Raymond

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to