Raymond Hettinger wrote:
* It will assist pypy style projects and other python implementations
when they have to build equivalents to CPython.
* Will eliminate confusion about what functions were exactly intended to
do.
* Will confer benefits similar to test driven development where the
documentation and pure python version are developed first and doctests
gotten to pass, then the C version is created to match.
I haven't seen anyone comment about this assertion of "equivalence".
Doesn't it strike you as difficult to maintain *two* versions of every
function, and ensure they match *exactly*?
Glad you brought this up. My idea is to present rough equivalence
in unoptimized python that is simple and clear. The goal is to provide
better documentation where code is more precise than English prose.
That being said, some subset of the existing tests should be runnable
against the rough equivalent and the python code should incorporate doctests.
Running both sets of test should suffice to maintain the rough equivalence.
The notion of exact equivalence should be left to PyPy folks who can attest
that the code can get convoluted when you try to simulate exactly when
error checking is performed, read-only behavior for attributes, and making
the stacktraces look the same when there are errors. In contrast, my
goal is an approximation that is executable but highly readable and expository.
My thought is to do this only with tools where it really does enhance the
documentation. The exercise is worthwhile in and of itself. For example,
I'm working on a pure python version of str.split() and quickly determined
that the docs are *still* in error even after many revisions over the years
(the whitespace version does not, in fact, start by stripping whitespace
from both ends). Here's what I have so far:
def split(s, sep=None, maxsplit=-1):
"""split(S, [sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.
>>> from itertools import product
>>> s = ' 11 2 333 4 '
>>> split(s, None)
['11', '2', '333', '4']
>>> n = 8
>>> for s in product('ab ', repeat=n):
... for maxsplit in range(-2, len(s)+2):
... s = ''.join(s)
... assert s.split(None, maxsplit) == split(s, None, maxsplit), namedtuple('Err', 'str maxsplit result target')(repr(s),
maxsplit, split(s,None,maxsplit), s.split(None, maxsplit))
"""
result = []
spmode = True
start = 0
if maxsplit != 0:
for i, c in enumerate(s):
if spmode:
if not c.isspace():
start = i
spmode = False
elif c.isspace():
result.append(s[start:i])
start = i
spmode = True
if len(result) == maxsplit:
break
rest = s[start:].lstrip()
return (result + [rest]) if rest else result
Once I have the cleanest possible, self-explantory code that passes tests, I'll improve the variable names and make a more sensible
docstring with readable examples. Surprisingly, it hasn't been a trivial exercise to come-up with an equivalent that corresponds
more closely to the way we think instead of corresponding the C code -- I want to show *what* is does more than *how* it does it.
Raymond
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com