Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Antoine Pitrou, 15.07.2012 17:06: > On Sun, 15 Jul 2012 16:33:23 +0200 > Christian Heimes wrote: >> Am 15.07.2012 16:22, schrieb Antoine Pitrou: >>> On Mon, 16 Jul 2012 00:08:41 +1000 >>> Nick Coghlan wrote: Right, I agree on the value in being able to return something to say "this cannot be converted to a concrete container". >>> >>> Who would be able to return that, apart from trivial cases like >>> itertools.cycle()? >> >> For example most numerical sequence iterators like Fibonacci generator, >> prime number sequence generator and even trivial cases like even natural >> number generator. > > First, you can't implement __length_hint__ for a generator, which is the > preferred (the most practical) way of writing iterators in pure Python. It can be implemented for generator expressions without a conditional, though, including the case of comprehensions. I wanted to do this in Cython for a while, but the protocol wasn't very well adapted to that use case. The "don't know" case was just too common and inefficient. For the other points, I agree with the already presented counterarguments. Being able to prevent some obvious traps is a good thing, even if you can't prevent all of them. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Alex Gaynor, 15.07.2012 00:11: > CPython currently defines an ``__length_hint__`` method on several types, such > as various iterators. This method is then used by various other functions > (such > as ``map``) to presize lists based on the estimated returned by > ``__length_hint__``. Types can then define ``__length_hint__`` which are not > sized, and thus should not define ``__len__``, but can estimate or compute a > size (such as many iterators). > > Proposal > > > This PEP proposes formally documenting ``__length_hint__`` for other > interpreter and non-standard library Python to implement. > > ``__length_hint__`` must return an integer, and is not required to be > accurate. > It may return a value that is either larger or smaller than the actual size of > the container. It may raise a ``TypeError`` if a specific instance cannot have > its length estimated. It may not return a negative value. I'd like to more visibly repeat my suggestion to make this a slot method "tp_length_hint()" in extension types that returns a Py_ssize_t. That suggests that a negative return value would have a special meaning instead of relying on return values like NotImplemented. The Python wrapper of that slot method could still implement a mapping for this. Return values could be -1 for "don't know" and -2 for "infinite" at the C level, and NotImplemented for "don't know" at the Python level. Not sure about a good Python value for "infinite". Maybe return -1 for "infinite" at both levels and -2/NotImplemented for "don't know" in C/Python? That would suggest -3 to propagate exceptions at the C level. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Proposing anything substantially more complicated than what is currently implemented in CPython will just get the idea rejected. The substantial immediate gain for PyPy is in skipping the memory resizing when building containers from itertools iterators, not anywhere else. -- Sent from my phone, thus the relative brevity :) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
To quote from PEP 424: Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Why is it a significant optimisation? How much slower is it? Where is the data? *If* resizing list is so slow, then why not make it faster? To quote wikipedia (http://en.wikipedia.org/wiki/Dynamic_array) """ As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a-1), while the number of wasted cells is bounded above by (a-1)n. The choice of a depends on the library or application: some textbooks use a = 2, but Java's ArrayList implementation uses a = 3/2 and the C implementation of Python's list data structure uses a = 9/8. """ If resizing of lists is too slow, then we should reconsider the 9/8 factor and/or look to tweak the resizing code. Cheers, Mark. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Nick Coghlan, 16.07.2012 10:26: > Proposing anything substantially more complicated than what is currently > implemented in CPython will just get the idea rejected. The substantial > immediate gain for PyPy is in skipping the memory resizing when building > containers from itertools iterators, not anywhere else. The same applies to Cython, where the extension types that implement generator expressions can benefit from propagating the length hint of the underlying iterator. A type slot would help in making this more efficient overall, also for CPython itself. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Mark Shannon, 16.07.2012 10:37: > If resizing of lists is too slow, then we should reconsider the 9/8 factor > and/or look to tweak the resizing code. The thing is that the performance is platform specific. On systems with a fast memory allocator, especially on Linux, the difference is negligible. However, with a slow memory allocator, especially a slow realloc(), e.g. on Windows or Solaris, this can substantially hurt the performance, up to a quadratically increasing runtime in the worst case. The length hint was implemented specifically to work around this problem. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Jul 16, 2012, at 1:37 AM, Mark Shannon wrote: > To quote from PEP 424: > >> Rationale >> = >> Being able to pre-allocate lists based on the expected size, as estimated by >> ``__length_hint__``, can be a significant optimization. CPython has been >> observed to run some code faster than PyPy, purely because of this >> optimization >> being present. > > Why is it a significant optimisation? Unless pre-sized by with a length prediction, a growing list periodically needs to call realloc() which can move all the data to a new location in memory.Pre-sizing avoids that entirely. > If resizing of lists is too slow, then we should reconsider the 9/8 factor > and/or look to tweak the resizing code. A great deal of thought and care went into the current design. It has already been "tweaked". Raymond P.S. The dictionary code also uses presizing for copies, updates, set conversion, etc. It is a perfectly reasonable technique to pre-allocate the correct size container when the ultimate length is knowable in advance. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Mon, Jul 16, 2012 at 11:02 AM, Stefan Behnel wrote: > Mark Shannon, 16.07.2012 10:37: > > If resizing of lists is too slow, then we should reconsider the 9/8 > factor > > and/or look to tweak the resizing code. > > The thing is that the performance is platform specific. On systems with a > fast memory allocator, especially on Linux, the difference is negligible. > However, with a slow memory allocator, especially a slow realloc(), e.g. on > Windows or Solaris, this can substantially hurt the performance, up to a > quadratically increasing runtime in the worst case. > > The length hint was implemented specifically to work around this problem. > > Stefan > > It's not the actual allocation (typically), it's the copying that's the problem. As far as data goes - preallocation can make 4x difference (on PyPy, although the dominant operation is not different on CPython) on ''.join(some-iterable). It depends grossly on the sizes of the list, so you can't claim that there is a precise speedup of a constant factor, however, there are cases where it *can* be significant (as in the copying is by far the dominating operation), most notable giant templates with iterators written in C. Speaking of which - I find this bikeshed disgusting. The purpose of the PEP is to codify whatever is already written in code in CPython. If you guys don't want it, we'll just implement it anyway and try to follow the CPython current implementation from 2.7. Cheers, fijal ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
> Speaking of which - I find this bikeshed disgusting. Disgusting? Irritating, perhaps, but why should a PEP -- even one whose purpose is to codify existing practice -- not result in discussions about its subject matter? The final P stands for Proposal, not for Pronouncement. TJG ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Maciej Fijalkowski, 16.07.2012 11:14: > On Mon, Jul 16, 2012 at 11:02 AM, Stefan Behnel wrote: >> Mark Shannon, 16.07.2012 10:37: >>> If resizing of lists is too slow, then we should reconsider the 9/8 >>> factor and/or look to tweak the resizing code. >> >> The thing is that the performance is platform specific. On systems with a >> fast memory allocator, especially on Linux, the difference is negligible. >> However, with a slow memory allocator, especially a slow realloc(), e.g. on >> Windows or Solaris, this can substantially hurt the performance, up to a >> quadratically increasing runtime in the worst case. >> >> The length hint was implemented specifically to work around this problem. > > It's not the actual allocation (typically), it's the copying that's the > problem. Note that a call to realloc() includes that part and can avoid copying if possible. A good realloc() implementation can make this use case run in amortised linear time, at least on a system with sufficient memory. A bad one can result in quadratic runtime, which is way more than a change by a constant factor. Thus my above comment that it's platform specific. > there are cases where it *can* be significant (as in the copying is by far > the dominating operation), most notable giant templates with iterators > written in C. Absolutely. This is particularly visible at the C level because C implemented iterators have a very low overhead overall. > Speaking of which - I find this bikeshed disgusting. The purpose of the PEP > is to codify whatever is already written in code in CPython. If you guys > don't want it, we'll just implement it anyway and try to follow the CPython > current implementation from 2.7. The idea behind this bikeshedding is that at the moment where we make this an official protocol rather than an implementation detail, it should be able to communicate the different states on the callee side of such a protocol. I.e. it needs a better design than the current one. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Jul 15, 2012, at 7:22 AM, Antoine Pitrou wrote:
> On Mon, 16 Jul 2012 00:08:41 +1000
> Nick Coghlan wrote:
>> Right, I agree on the value in being able to return something to say "this
>> cannot be converted to a concrete container".
>
> Who would be able to return that, apart from trivial cases like
> itertools.cycle()?
FWIW, here are the notes from the docstring in Lib/test/test_iterlen.py:
""" Test Iterator Length Transparency
Some functions or methods which accept general iterable arguments have
optional, more efficient code paths if they know how many items to expect.
For instance, map(func, iterable), will pre-allocate the exact amount of
space required whenever the iterable can report its length.
The desired invariant is: len(it)==len(list(it)).
A complication is that an iterable and iterator can be the same object. To
maintain the invariant, an iterator needs to dynamically update its length.
For instance, an iterable such as xrange(10) always reports its length as ten,
but it=iter(xrange(10)) starts at ten, and then goes to nine after it.next().
Having this capability means that map() can ignore the distinction between
map(func, iterable) and map(func, iter(iterable)).
When the iterable is immutable, the implementation can straight-forwardly
report the original length minus the cumulative number of calls to next().
This is the case for tuples, xrange objects, and itertools.repeat().
Some containers become temporarily immutable during iteration. This includes
dicts, sets, and collections.deque. Their implementation is equally simple
though they need to permanently set their length to zero whenever there is
an attempt to iterate after a length mutation.
The situation slightly more involved whenever an object allows length mutation
during iteration. Lists and sequence iterators are dynamically updatable.
So, if a list is extended during iteration, the iterator will continue through
the new items. If it shrinks to a point before the most recent iteration,
then no further items are available and the length is reported at zero.
Reversed objects can also be wrapped around mutable objects; however, any
appends after the current position are ignored. Any other approach leads
to confusion and possibly returning the same item more than once.
The iterators not listed above, such as enumerate and the other itertools,
are not length transparent because they have no way to distinguish between
iterables that report static length and iterators whose length changes with
each call (i.e. the difference between enumerate('abc') and
enumerate(iter('abc')).
"""
Raymond___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On 16.07.12 10:36, Stefan Behnel wrote: Return values could be -1 for "don't know" and -2 for "infinite" at the C level, and NotImplemented for "don't know" at the Python level. PY_SSIZE_T_MAX is better value for "infinite". In any case no difference for consumer between PY_SSIZE_T_MAX and a real infinity. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] early startup error reporting failure
Hi there.
I've been busy taking the current beta candidate and merging it into the
stackless repo.
As expected, things don't just go smoothly and there are the usual startup
errors, this being a rather intrusive patch and all that.
However, I found that early startup errors were not being reported correctly,
so I had do make some changes to fix that. I'm not sure these are the correct
fixes, so I'd like to start this here and see if anyone feels responsible.
Right: The initial error occurs here:
if (PyImport_ImportFrozenModule("_frozen_importlib") <= 0) {
Py_FatalError("Py_Initialize: can't import _frozen_importlib");
My problem was that the actual exception was not being reported along with the
FatalError message.
Digging around a bit, I found the cause here:
fileobject.c, PyFile_WriteString()
}
else if (!PyErr_Occurred()) {
That is, this function declines to write anything if there is an exception
present.
My quick and dirty fix was to remove this test and just print even with a
present exception. That fixes the issue.
But perhaps the _correct_ way is to suppress the exception higher up in the
callchain, which is this:
> python33_d.dll!PyFile_WriteString(const char * s, _object * f) Line 179 C
python33_d.dll!PyTraceBack_Print(_object * v, _object * f) Line 415 + 0x11
bytes C
python33_d.dll!print_exception(_object * f, _object * value) Line 1748 +
0x12 bytes C
python33_d.dll!print_exception_recursive(_object * f, _object * value,
_object * seen) Line 1889 C
python33_d.dll!PyErr_Display(_object * exception, _object * value, _object *
tb) Line 1913 C
python33_d.dll!sys_excepthook(_object * self, _object * args) Line 197 C
python33_d.dll!PyCFunction_Call(_object * func, _object * arg, _object * kw)
Line 99 + 0x46 bytes C
python33_d.dll!PyObject_Call(_object * func, _object * arg, _object * kw)
Line 2149 + 0x48 bytes C
python33_d.dll!PyEval_CallObjectWithKeywords(_object * func, _object * arg,
_object * kw) Line 4584 C
python33_d.dll!PyErr_PrintEx(int set_sys_last_vars) Line 1686 + 0x12 bytes C
python33_d.dll!Py_FatalError(const char * msg) Line 2358 C
Perhaps error should be fetched and restored in PyTraceback_Print(), since it
already does some exception juggling, obviously assuming that an exception
state can be present that it is worthwhile to preserve.
Ok, then I came to the second issue.
When printing the tracebacks, this early in the process, I hit upon this code,
in
traceback.c, tb_displayline(), I made this change (line 344):
-return _Py_DisplaySourceLine(f, filename, lineno, 4);
+ /* ignore IO errors here, IO may not be ready yet */
+ if ( _Py_DisplaySourceLine(f, filename, lineno, 4) )
+ PyErr_Clear();
+ return err;
This early in the process, IO cannot be imported so it is impossible to output
source line. The source line output is a "bonus" feature anyway and we
shouldn't, IMHO, fail outputting tracebacks if we cannot read the code.
The actual failure was importing the IO library. Perhaps an alternative fix,
then, is to fix the _Py_DisplaySourceLine() so that it deals with failure to
import IO in the same way as failure to read the file, i.e. just returns a
"success" value of 0.
With these changes, I was able to successfully output the error. Hopefully I
will be able to debug it too :)
Any thoughts?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Jul 16, 2012, at 12:36 AM, Stefan Behnel wrote: > I'd like to more visibly repeat my suggestion to make this a slot method > "tp_length_hint()" in extension types that returns a Py_ssize_t. That is merely an implementation detail, but it would be a nice improvement. Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] early startup error reporting failure
Looking better at the code, the fileobject change isn't necessary. A simpler fix is to just ignore and clear errors from _Py_DisplaySourceLine. I´ll prepare a defect/patch K Frá: [email protected] [[email protected]] fyrir hönd Kristján Valur Jónsson [[email protected]] Sent: 16. júlí 2012 09:42 To: [email protected] Efni: [Python-Dev] early startup error reporting failure Hi there. I've been busy taking the current beta candidate and merging it into the stackless repo. As expected, things don't just go smoothly and there are the usual startup errors, this being a rather intrusive patch and all that. However, I found that early startup errors were not being reported correctly, so I had do make some changes to fix that. I'm not sure these are the correct fixes, so I'd like to start this here and see if anyone feels responsible. Right: The initial error occurs here: if (PyImport_ImportFrozenModule("_frozen_importlib") <= 0) { Py_FatalError("Py_Initialize: can't import _frozen_importlib"); My problem was that the actual exception was not being reported along with the FatalError message. Digging around a bit, I found the cause here: fileobject.c, PyFile_WriteString() } else if (!PyErr_Occurred()) { That is, this function declines to write anything if there is an exception present. My quick and dirty fix was to remove this test and just print even with a present exception. That fixes the issue. But perhaps the _correct_ way is to suppress the exception higher up in the callchain, which is this: > python33_d.dll!PyFile_WriteString(const char * s, _object * f) Line 179 C python33_d.dll!PyTraceBack_Print(_object * v, _object * f) Line 415 + 0x11 bytes C python33_d.dll!print_exception(_object * f, _object * value) Line 1748 + 0x12 bytes C python33_d.dll!print_exception_recursive(_object * f, _object * value, _object * seen) Line 1889 C python33_d.dll!PyErr_Display(_object * exception, _object * value, _object * tb) Line 1913 C python33_d.dll!sys_excepthook(_object * self, _object * args) Line 197 C python33_d.dll!PyCFunction_Call(_object * func, _object * arg, _object * kw) Line 99 + 0x46 bytes C python33_d.dll!PyObject_Call(_object * func, _object * arg, _object * kw) Line 2149 + 0x48 bytes C python33_d.dll!PyEval_CallObjectWithKeywords(_object * func, _object * arg, _object * kw) Line 4584 C python33_d.dll!PyErr_PrintEx(int set_sys_last_vars) Line 1686 + 0x12 bytes C python33_d.dll!Py_FatalError(const char * msg) Line 2358 C Perhaps error should be fetched and restored in PyTraceback_Print(), since it already does some exception juggling, obviously assuming that an exception state can be present that it is worthwhile to preserve. Ok, then I came to the second issue. When printing the tracebacks, this early in the process, I hit upon this code, in traceback.c, tb_displayline(), I made this change (line 344): -return _Py_DisplaySourceLine(f, filename, lineno, 4); + /* ignore IO errors here, IO may not be ready yet */ + if ( _Py_DisplaySourceLine(f, filename, lineno, 4) ) + PyErr_Clear(); + return err; This early in the process, IO cannot be imported so it is impossible to output source line. The source line output is a "bonus" feature anyway and we shouldn't, IMHO, fail outputting tracebacks if we cannot read the code. The actual failure was importing the IO library. Perhaps an alternative fix, then, is to fix the _Py_DisplaySourceLine() so that it deals with failure to import IO in the same way as failure to read the file, i.e. just returns a "success" value of 0. With these changes, I was able to successfully output the error. Hopefully I will be able to debug it too :) Any thoughts? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
> *If* resizing list is so slow, then why not make it faster? A simple solution to speed up such problem is to change the overallocation factor, but it may waste memory. Python tries to be fast and to not waste too much memory. > Why is it a significant optimisation? > How much slower is it? > Where is the data? I worked recently on optimizing str%args and str.format(args). Handle correctly the memory allocation is just critical for performances, especially for str with the PEP 393, because we have to shrink the buffer to the exact string length with the formatting function is done. I tried various overallocation factors and I chose 1.25 (5/4) because it was the fastest. See for example this issue for benchmark numbers: http://bugs.python.org/issue14687 The PyUnicodeWriter internal object uses various options to choose how many bytes should be allocated: * an overallocation flag to disable overallocation when we know that we are writing the last character/string into be buffer * a "minimal length" used for the first allocation * an hardcoded overallocation factor (1.25) PyUnicodeWriter is a little bit different than the __length_hint__ issue because PyUnicodeWriter has to shrink the buffer when it is done, but I can say that overallocation is very useful for speed. Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Mon, Jul 16, 2012 at 3:36 AM, Stefan Behnel wrote: > Alex Gaynor, 15.07.2012 00:11: > > CPython currently defines an ``__length_hint__`` method on several > types, such > > as various iterators. This method is then used by various other > functions (such > > as ``map``) to presize lists based on the estimated returned by > > ``__length_hint__``. Types can then define ``__length_hint__`` which are > not > > sized, and thus should not define ``__len__``, but can estimate or > compute a > > size (such as many iterators). > > > > Proposal > > > > > > This PEP proposes formally documenting ``__length_hint__`` for other > > interpreter and non-standard library Python to implement. > > > > ``__length_hint__`` must return an integer, and is not required to be > accurate. > > It may return a value that is either larger or smaller than the actual > size of > > the container. It may raise a ``TypeError`` if a specific instance > cannot have > > its length estimated. It may not return a negative value. > > I'd like to more visibly repeat my suggestion to make this a slot method > "tp_length_hint()" in extension types that returns a Py_ssize_t. > > That suggests that a negative return value would have a special meaning > instead of relying on return values like NotImplemented. The Python wrapper > of that slot method could still implement a mapping for this. > > Return values could be -1 for "don't know" and -2 for "infinite" at the C > level, and NotImplemented for "don't know" at the Python level. Not sure > about a good Python value for "infinite". > Gods no. Making the return value different in C vs. Python code is just asking for trouble in terms of having to remember that specific difference while coding. Plus asking for people to check for an explicit negative values instead of just >= 0 would be problematic and prone to error. > > Maybe return -1 for "infinite" at both levels and -2/NotImplemented for > "don't know" in C/Python? That would suggest -3 to propagate exceptions at > the C level. > See above. This is another reason why I don't think the infinite iterator concept is worth expressin. It's just mucking things up for no good reason. "infinite" == "I don't know" in the case of pre-allocation of a list. Just raise an exception or return None and be done with it. Nice and simple. And my vote is for an exception as EAFP. -Brett ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Mon, Jul 16, 2012 at 7:21 PM, Tim Golden wrote:
>> Speaking of which - I find this bikeshed disgusting.
>
> Disgusting? Irritating, perhaps, but why should a PEP -- even one whose
> purpose is to codify existing practice -- not result in discussions
> about its subject matter?
>
> The final P stands for Proposal, not for Pronouncement.
Indeed - I'd be worried if any PEP sailed through python-dev review
without a thorough kicking of the tires. Yes, it can be annoying
having to bring people up to speed on issues that they aren't familiar
with, but that's generally a sign that there is relevant background
information *missing from the PEP*.
PEP's aren't supposed to be written just for people that are already
intimately familiar with a problem - they're supposed to provide
enough background that they stand on their own.
In this case, the key points that I think need to be added:
- more background on why the __length_hint__ API exists in CPython in
the first place: to minimise potentially expensive data copies (due to
memory reallocation) when creating a concrete container from an
iterator. This includes when creating them from another concrete
container via an intermediate iterator. This is why at least the
following produce objects that define __length_hint__ in CPython:
reversed
itertools.repeat
iter(dict())
iter(list())
iter(tuple())
iter(str())
iter(bytes())
iter(bytearray())
iter(set())
iter(frozenset())
dict.values()
dict.keys()
As well as any user defined sequence that relies on the default
sequence iterator:
>>> class MySeq():
... def __getitem__(self, idx):
... return idx
... def __len__(self):
... return 10
...
>>> hasattr(iter(MySeq()), "__length_hint__")
True
- clarification on the implications of it only being a "hint":
specifically, as it may be an over- or underestimate, you *cannot*
rely on the hint to decide whether or not to iterate over the object
when a valid length is returned (as a value of zero may be an
underestimate). However, it does allow you to presize your container
more appropriately than just starting at zero as usual, thus achieving
the aim of reducing the risk of unnecessary memory copies.
That's the basic proposal. Separate from that, there are a few
suggestions for *enhancement* beyond what CPython currently uses (and
has demonstrated a clear need for):
- adding operator.length_hint(). This makes sense to me, as it makes
it much easier to use the API when implementing a container type in
Python. It's also a pretty simple addition.
- adding a C level type slot. I'm personally -1 on this one in the
context of the PEP. I don't think the current PEP (which is really
aimed at standardisation for PyPy's benefit) should be weighed down
with this CPython specific implementation detail. As a separate
proposal, independent of this PEP, from someone that cares
specifically about micro-optimising this API for CPython, and
(preferably) can produce benchmark numbers to show the additional
implementation complexity is worthwhile, then I wouldn't object. I
just don't want this orthogonal concern to derail the standardisation
effort.
- distinguishing between different reasons for saying "don't
preallocate any space" (i.e. returning zero). I still haven't heard a
convincing rationale for this one - it seems to be based on the notion
that the case of skipping the iteration step for a genuinely empty
iterable is worth optimising. This idea just doesn't make sense when
any legitimate length value that is returned can't be trusted to be
completely accurate - you have to iterate to confirm the actual
number.
- making it possible to fail *fast* when a known infinite iterator
(like itertools.cycle or itertools.count) is passed to a concrete
container. I think this is best covered in the PEP by explicitly
stating that some types may implement __length_hint__ to always raise
TypeError rather than defining a magic return value that means "I'm
infinite".
- making it possible for iterators like enumerate, map and filter to
delegate __length_hint__ to their underlying iterator. This seems
potentially worth doing, but requires resolving the problem that
Raymond noted with handling the difference in internal behaviour
between enumerate("abc") and enumerate(iter("abc")). Again, it would
be unfortunate to see the PEP held up over this.
- making it possible to define __length_hint__ for generator-iterator
objects. While this is a nice idea, again, I don't think it's
something that this particular PEP should be burdened with.
My main point is that the current __length_hint__ behaviour has
already proven its value in the real world. The PyPy team would like
that behaviour codified, so they can be reasonably sure both
implementations are doing the same thing. Many of the enhancements I
have listed above may be suitable material for future enhancement
proposals, but I'm not seeing any requeste
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Mon, 16 Jul 2012 23:23:18 +1000 Nick Coghlan wrote: > > - distinguishing between different reasons for saying "don't > preallocate any space" (i.e. returning zero). I still haven't heard a > convincing rationale for this one The point is that zero is a valid value for a length hint. By making it a special value for "don't know", you are making the protocol potentially confusing, and you are also departing from the current semantics. (and, yes, I think distinguishing between zero and "don't know" is useful: imagine a container that would preallocate 256 entries by default when the answer is "don't know") > The PEP itself already has this general tone, but I think that it > should be even more emphatic that it's about codifying the status quo, > *not* about modifying it with ideas haven't been proven useful through > past experience. Then the PEP shouldn't address infinite iterators at all. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Tue, Jul 17, 2012 at 12:01 AM, Antoine Pitrou wrote: > On Mon, 16 Jul 2012 23:23:18 +1000 > Nick Coghlan wrote: >> >> - distinguishing between different reasons for saying "don't >> preallocate any space" (i.e. returning zero). I still haven't heard a >> convincing rationale for this one > > The point is that zero is a valid value for a length hint. By making it > a special value for "don't know", you are making the protocol > potentially confusing, and you are also departing from the current > semantics. No, it just means the default estimate is always zero. If you don't do that, then *every* client of __length_hint__ has to check for the magic value. It's making the API harder to use for no good reason. > (and, yes, I think distinguishing between zero and "don't know" is > useful: imagine a container that would preallocate 256 entries by > default when the answer is "don't know") Such a container has to already deal with the case when it overestimates severely. The only cost of using zero as a default estimate is that such hypothetical containers will overallocate when they technically didn't need to, which will already happen for any empty iterator that doesn't provide __length_hint__ at all. Given that all standard library containers default to assuming a size of zero (absent information indicating otherwise), and a special value would need to be special cased by *every* client of the API (and almost always defaulted to zero), it's simply not a good trade-off. >> The PEP itself already has this general tone, but I think that it >> should be even more emphatic that it's about codifying the status quo, >> *not* about modifying it with ideas haven't been proven useful through >> past experience. > > Then the PEP shouldn't address infinite iterators at all. Noting that infinite iterators are free to define __length_hint__ to always throw an exception *is* the status quo. We just haven't done it for the stdlib ones. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Tue, 17 Jul 2012 00:18:55 +1000 Nick Coghlan wrote: > > Given that all standard library containers default to assuming a size > of zero (absent information indicating otherwise), and a special value > would need to be special cased by *every* client of the API (and > almost always defaulted to zero), it's simply not a good trade-off. Actually, dict and set default to a non-zero internal size, but I agree making containers harder to implement is not a good thing. > >> The PEP itself already has this general tone, but I think that it > >> should be even more emphatic that it's about codifying the status quo, > >> *not* about modifying it with ideas haven't been proven useful through > >> past experience. > > > > Then the PEP shouldn't address infinite iterators at all. > > Noting that infinite iterators are free to define __length_hint__ to > always throw an exception *is* the status quo. Being "free" to do unexpected or unconventional things is not the same thing as codifying those behaviours in a PEP, especially when noone is actually doing them. __length_hint__ is supposed to be informative, it shouldn't error out on you. So still -1 from me. Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On 7/16/2012 9:54 AM, Stefan Behnel wrote: Mark Shannon, 15.07.2012 16:14: Alex Gaynor wrote: CPython currently defines an ``__length_hint__`` method on several types, such as various iterators. This method is then used by various other functions (such as ``map``) to presize lists based on the estimated returned by Don't use "map" as an example. map returns an iterator so it doesn't need __length_hint__ Right. It's a good example for something else, though. As I mentioned before, iterators should be able to propagate the length hint of an underlying iterator, e.g. in generator expressions or map(). I consider that an important feature that the protocol must support. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mstefanro%40gmail.com map() is quite problematic in this matter, and may actually benefit from the existence of __length_hint__. It is very easy to create an infinite loop currently by doing stuff like x=[1]; x+=map(str,x) [61081 refs] >>> x=[1]; x+=map(str,x) Traceback (most recent call last): ... MemoryError [120959834 refs] >>> len(x) 120898752 Obviously, this won't cause an infinite loop in Python2 where map is non-lazy. Also, this won't work for all mutable containers, because not all of them permit adding elements while iterating: >>> s=set([1]); s.update(map(str,s)) Traceback (most recent call last): ... RuntimeError: Set changed size during iteration [61101 refs] >>> s {1, '1'} [61101 refs] >>> del s [61099 refs] If map objects were to disallow changing the size of the container while iterating (I can't really think of an use-case in which such a limitation would be harmful), it might as well be with __length_hint__. Also, what would iter([1,2,3]).__length_hint__() return? 3 or unknown? If 3, then the semantics of l=[1,2,3]; l += iter(l) will change (infinite loop without __length_hint__ vs. list of 6 elements with __length_hint__). If unknown, then it doesn't seem like there are very many places where __length_hint__ can return anything but unknown. Regards, Stefan M ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
M Stefan wrote: Also, what would iter([1,2,3]).__length_hint__() return? 3 or unknown? If 3, then the semantics of l=[1,2,3]; l += iter(l) will change (infinite loop without __length_hint__ vs. list of 6 elements with __length_hint__). What __length_hint__ returns is irrelevant -- it's only a hint. Python will have to loop over all the items. So you would still get an infinite loop with the above code. ~Ethan~ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
I've updated the PEP to reflect the discussion. There are two major changes: 1) NotImplemented may be used by __length_hint__ to indicate that there is no finite length hint available. 2) callers of operator.length_hint() must provide their own default value, this is also required by the current C _PyObject_LengthHint implementation. There are no provisions for infinite iterators, that is not within the scope of this proposal. Alex ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Tue, Jul 17, 2012 at 1:03 PM, Alex Gaynor wrote: > I've updated the PEP to reflect the discussion. There are two major changes: > > 1) NotImplemented may be used by __length_hint__ to indicate that there is no >finite length hint available. I've been thinking about this a bit more, and I think it does provide good scope for eventually adding __length_hint__ to more iterators (including map, filter and enumerate). > 2) callers of operator.length_hint() must provide their own default value, > this >is also required by the current C _PyObject_LengthHint implementation. And this makes it explicit that API users need to deal with the AttributeError/NotImplemented case, whilst making it easy to do so. Good call. > There are no provisions for infinite iterators, that is not within the scope > of > this proposal. I'll repeat my observation that remaining silent on this point is effectively identical to blessing the practice of raising an exception in __length_hint__ to force fast failure of attempts to convert an infinite iterator to a concrete container. Rather than leaving people to figure this out on their own, we may as well make it official that TypeError can be raised in __length_hint__ to block conversion to concrete containers that use a preallocation strategy. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] io.BytesIO slower than monkey-patching io.RawIOBase
While working on #1767933, Serhiy came up with an observation that "monkey-patching" one of the base classes of io is faster than using BytesIO when in need of a file-like object for writing into. I've distilled it into this standalone test: import io data = [b'a'*10, b'bb'*5, b'ccc'*5] * 1 def withbytesio(): bio = io.BytesIO() for i in data: bio.write(i) return bio.getvalue() def monkeypatching(): mydata = [] file = io.RawIOBase() file.writable = lambda: True file.write = mydata.append for i in data: file.write(i) return b''.join(mydata) The second approach is consistently 10-20% faster than the first one (depending on input) for trunk Python 3.3 Is there any reason for this to be so? What does BytesIO give us that the second approach does not (I tried adding more methods to the patched RawIOBase to make it more functional, like seekable() and tell(), and it doesn't affect performance)? This also raises a "moral" question - should I be using the second approach deep inside the stdlib (ET.tostring) just because it's faster? Eli ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] io.BytesIO slower than monkey-patching io.RawIOBase
> > The second approach is consistently 10-20% faster than the first one > (depending on input) for trunk Python 3.3 > I think the difference is that StringIO spends extra time reallocating memory during the write loop as it grows, whereas bytes.join computes the allocation size first since it already knows the final length. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] io.BytesIO slower than monkey-patching io.RawIOBase
On Tue, Jul 17, 2012 at 2:57 PM, John O'Connor wrote: >> >> The second approach is consistently 10-20% faster than the first one >> (depending on input) for trunk Python 3.3 >> > > I think the difference is that StringIO spends extra time reallocating > memory during the write loop as it grows, whereas bytes.join computes > the allocation size first since it already knows the final length. BytesIO is actually missing an optimisation that is already used in StringIO: the StringIO C implementation uses a fragment accumulator internally, and collapses that into a single string object when getvalue() is called. BytesIO is still using the old "resize-the-buffer-as-you-go" strategy, and thus ends up repeatedly reallocating the buffer as the data sequence grows incrementally. It should be optimised to work the same way StringIO does (which is effectively the same way that the monkeypatched version works) Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
