Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Aug 1, 2012, at 1:46 AM, Mark Shannon wrote: ''' Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__, can be a significant optimization. PyPy has been observed to run some code slower than CPython, purely because this optimization is absent. ''' Which is a PyPy bug report, not a rationale for a PEP ;) Alex's rationale is correct and well expressed. Your proposed revision reflects fuzzy thinking about why __length_hint__ is useful. Regardless of resizing growth factors, it is *always* helpful to know how much memory to allocate. Calls to the allocators (especially for large blocks) and possible the recopying of data should be avoided when possible. Raymond___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
While the idea behind PEP 424 is sound, the text of the PEP is rather vague and missing a lot of details. There was extended discussion on the details, but none of that has appeared in the PEP yet. So Alex, how about adding those details? Also the rationale is rather poor. Given that CPython is the reference implementation, PyPy should be compared to CPython, not vice-versa. Reversing PyPy and CPython in the rationale gives: ''' Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__, can be a significant optimization. PyPy has been observed to run some code slower than CPython, purely because this optimization is absent. ''' Which is a PyPy bug report, not a rationale for a PEP ;) Perhaps a better rationale would something along the lines of: ''' Adding a __length_hint__ method to the iterator protocol allows sequences, notably lists, to be initialised from iterators with only a single resize operation. This allows sequences to be intialised quickly, yet have a small growth factor, reducing memory use. ''' Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Wed, Aug 1, 2012 at 10:46 AM, Mark Shannon m...@hotpy.org wrote: While the idea behind PEP 424 is sound, the text of the PEP is rather vague and missing a lot of details. There was extended discussion on the details, but none of that has appeared in the PEP yet. So Alex, how about adding those details? Also the rationale is rather poor. Given that CPython is the reference implementation, PyPy should be compared to CPython, not vice-versa. Reversing PyPy and CPython in the rationale gives: ''' Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__, can be a significant optimization. PyPy has been observed to run some code slower than CPython, purely because this optimization is absent. ''' Which is a PyPy bug report, not a rationale for a PEP ;) Perhaps a better rationale would something along the lines of: ''' Adding a __length_hint__ method to the iterator protocol allows sequences, notably lists, to be initialised from iterators with only a single resize operation. This allows sequences to be intialised quickly, yet have a small growth factor, reducing memory use. ''' Hi Mark. It's not about saving memory. It really is about speed. Noone bothered measuring cpython with length hint disabled to compare, however we did that for pypy hence the rationale contains it. It's merely to state this seems like an important optimization. Since the C-level code involved is rather similar (it's mostly runtime anyway), it seems reasonable to draw a conclusion that removing length hint from cpython would cause slowdown. Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Maciej Fijalkowski wrote: On Wed, Aug 1, 2012 at 10:46 AM, Mark Shannon m...@hotpy.org wrote: While the idea behind PEP 424 is sound, the text of the PEP is rather vague and missing a lot of details. There was extended discussion on the details, but none of that has appeared in the PEP yet. So Alex, how about adding those details? Also the rationale is rather poor. Given that CPython is the reference implementation, PyPy should be compared to CPython, not vice-versa. Reversing PyPy and CPython in the rationale gives: ''' Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__, can be a significant optimization. PyPy has been observed to run some code slower than CPython, purely because this optimization is absent. ''' Which is a PyPy bug report, not a rationale for a PEP ;) Perhaps a better rationale would something along the lines of: ''' Adding a __length_hint__ method to the iterator protocol allows sequences, notably lists, to be initialised from iterators with only a single resize operation. This allows sequences to be intialised quickly, yet have a small growth factor, reducing memory use. ''' Hi Mark. It's not about saving memory. It really is about speed. Noone bothered measuring cpython with length hint disabled to compare, however we did that for pypy hence the rationale contains it. It's merely to state this seems like an important optimization. Since the C-level code involved is rather similar (it's mostly runtime anyway), it seems reasonable to draw a conclusion that removing length hint from cpython would cause slowdown. It is not about making it faster *or* saving memory, but *both*. Without __length_hint__ there is a trade off between speed and memory use. You can have speed at the cost of memory by increasing the resize factor. With __length_hint__ you can get both speed and good memory use. Cheers, Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Wed, Aug 1, 2012 at 12:06 PM, Mark Shannon m...@hotpy.org wrote: Maciej Fijalkowski wrote: On Wed, Aug 1, 2012 at 10:46 AM, Mark Shannon m...@hotpy.org wrote: While the idea behind PEP 424 is sound, the text of the PEP is rather vague and missing a lot of details. There was extended discussion on the details, but none of that has appeared in the PEP yet. So Alex, how about adding those details? Also the rationale is rather poor. Given that CPython is the reference implementation, PyPy should be compared to CPython, not vice-versa. Reversing PyPy and CPython in the rationale gives: ''' Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__, can be a significant optimization. PyPy has been observed to run some code slower than CPython, purely because this optimization is absent. ''' Which is a PyPy bug report, not a rationale for a PEP ;) Perhaps a better rationale would something along the lines of: ''' Adding a __length_hint__ method to the iterator protocol allows sequences, notably lists, to be initialised from iterators with only a single resize operation. This allows sequences to be intialised quickly, yet have a small growth factor, reducing memory use. ''' Hi Mark. It's not about saving memory. It really is about speed. Noone bothered measuring cpython with length hint disabled to compare, however we did that for pypy hence the rationale contains it. It's merely to state this seems like an important optimization. Since the C-level code involved is rather similar (it's mostly runtime anyway), it seems reasonable to draw a conclusion that removing length hint from cpython would cause slowdown. It is not about making it faster *or* saving memory, but *both*. Without __length_hint__ there is a trade off between speed and memory use. You can have speed at the cost of memory by increasing the resize factor. No, you cannot. if you allocate a huge region, you're not gonna make much of speed, because at the end you need to copy stuff anyway. Besides large allocations are slow. With length hint that is correct (sometimes you can do that) you have a zero-copy scenario ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, Jul 15, 2012 at 1:36 PM, Antoine Pitrou solip...@pitrou.net wrote: On Sun, 15 Jul 2012 18:47:38 +1000 Nick Coghlan ncogh...@gmail.com wrote: I'm not seeing the value in returning None over 0 for the don't know case - it just makes the API harder to use. The point is that 0 is a legitimate value for a length hint. Simple implementations of __length_hint__ will start returning 0 as a legitimate value and you will wrongly interpret that as don't know, which kinds of defeat the purpose of __length-hint__ ;) I agree with this: giving special meaning to what's already a valid length value seems wrong. Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Tue, 17 Jul 2012 13:19:55 +1000 Nick Coghlan ncogh...@gmail.com wrote: There are no provisions for infinite iterators, that is not within the scope of this proposal. I'll repeat my observation that remaining silent on this point is effectively identical to blessing the practice of raising an exception in __length_hint__ to force fast failure of attempts to convert an infinite iterator to a concrete container. And I'll repeat that it is false ;) Being silent is certainly not the same thing as blessing a non-existent practice. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, Jul 15, 2012 at 1:28 AM, Alexandre Zani alexandre.z...@gmail.com wrote: I'm +1 on not having a public API for this. Ultimately the contract for a length hint will depend heavily upon what you need it for. Some applications would require a length hint to be an at least others an at most and others something else entirely. Given that the contract here appears to be =0, I don't think the length hint is particularly useful to the public at large. Other possible related uses could be to get an approximate number of results for a query without having to actually go through the whole query, useful for databases and search engines. But then you *do* want __len__ as well, so that also doesn't fit with the current PEP. But maybe that's a completely different usecase, even though it seems related to me? //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Mark Shannon, 15.07.2012 16:14: Alex Gaynor wrote: CPython currently defines an ``__length_hint__`` method on several types, such as various iterators. This method is then used by various other functions (such as ``map``) to presize lists based on the estimated returned by Don't use map as an example. map returns an iterator so it doesn't need __length_hint__ Right. It's a good example for something else, though. As I mentioned before, iterators should be able to propagate the length hint of an underlying iterator, e.g. in generator expressions or map(). I consider that an important feature that the protocol must support. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Antoine Pitrou, 15.07.2012 17:06: On Sun, 15 Jul 2012 16:33:23 +0200 Christian Heimes wrote: Am 15.07.2012 16:22, schrieb Antoine Pitrou: On Mon, 16 Jul 2012 00:08:41 +1000 Nick Coghlan wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. Who would be able to return that, apart from trivial cases like itertools.cycle()? For example most numerical sequence iterators like Fibonacci generator, prime number sequence generator and even trivial cases like even natural number generator. First, you can't implement __length_hint__ for a generator, which is the preferred (the most practical) way of writing iterators in pure Python. It can be implemented for generator expressions without a conditional, though, including the case of comprehensions. I wanted to do this in Cython for a while, but the protocol wasn't very well adapted to that use case. The don't know case was just too common and inefficient. For the other points, I agree with the already presented counterarguments. Being able to prevent some obvious traps is a good thing, even if you can't prevent all of them. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Alex Gaynor, 15.07.2012 00:11: CPython currently defines an ``__length_hint__`` method on several types, such as various iterators. This method is then used by various other functions (such as ``map``) to presize lists based on the estimated returned by ``__length_hint__``. Types can then define ``__length_hint__`` which are not sized, and thus should not define ``__len__``, but can estimate or compute a size (such as many iterators). Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. I'd like to more visibly repeat my suggestion to make this a slot method tp_length_hint() in extension types that returns a Py_ssize_t. That suggests that a negative return value would have a special meaning instead of relying on return values like NotImplemented. The Python wrapper of that slot method could still implement a mapping for this. Return values could be -1 for don't know and -2 for infinite at the C level, and NotImplemented for don't know at the Python level. Not sure about a good Python value for infinite. Maybe return -1 for infinite at both levels and -2/NotImplemented for don't know in C/Python? That would suggest -3 to propagate exceptions at the C level. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Proposing anything substantially more complicated than what is currently implemented in CPython will just get the idea rejected. The substantial immediate gain for PyPy is in skipping the memory resizing when building containers from itertools iterators, not anywhere else. -- Sent from my phone, thus the relative brevity :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Nick Coghlan, 16.07.2012 10:26: Proposing anything substantially more complicated than what is currently implemented in CPython will just get the idea rejected. The substantial immediate gain for PyPy is in skipping the memory resizing when building containers from itertools iterators, not anywhere else. The same applies to Cython, where the extension types that implement generator expressions can benefit from propagating the length hint of the underlying iterator. A type slot would help in making this more efficient overall, also for CPython itself. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Jul 15, 2012, at 7:22 AM, Antoine Pitrou wrote: On Mon, 16 Jul 2012 00:08:41 +1000 Nick Coghlan ncogh...@gmail.com wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. Who would be able to return that, apart from trivial cases like itertools.cycle()? FWIW, here are the notes from the docstring in Lib/test/test_iterlen.py: Test Iterator Length Transparency Some functions or methods which accept general iterable arguments have optional, more efficient code paths if they know how many items to expect. For instance, map(func, iterable), will pre-allocate the exact amount of space required whenever the iterable can report its length. The desired invariant is: len(it)==len(list(it)). A complication is that an iterable and iterator can be the same object. To maintain the invariant, an iterator needs to dynamically update its length. For instance, an iterable such as xrange(10) always reports its length as ten, but it=iter(xrange(10)) starts at ten, and then goes to nine after it.next(). Having this capability means that map() can ignore the distinction between map(func, iterable) and map(func, iter(iterable)). When the iterable is immutable, the implementation can straight-forwardly report the original length minus the cumulative number of calls to next(). This is the case for tuples, xrange objects, and itertools.repeat(). Some containers become temporarily immutable during iteration. This includes dicts, sets, and collections.deque. Their implementation is equally simple though they need to permanently set their length to zero whenever there is an attempt to iterate after a length mutation. The situation slightly more involved whenever an object allows length mutation during iteration. Lists and sequence iterators are dynamically updatable. So, if a list is extended during iteration, the iterator will continue through the new items. If it shrinks to a point before the most recent iteration, then no further items are available and the length is reported at zero. Reversed objects can also be wrapped around mutable objects; however, any appends after the current position are ignored. Any other approach leads to confusion and possibly returning the same item more than once. The iterators not listed above, such as enumerate and the other itertools, are not length transparent because they have no way to distinguish between iterables that report static length and iterators whose length changes with each call (i.e. the difference between enumerate('abc') and enumerate(iter('abc')). Raymond___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On 16.07.12 10:36, Stefan Behnel wrote: Return values could be -1 for don't know and -2 for infinite at the C level, and NotImplemented for don't know at the Python level. PY_SSIZE_T_MAX is better value for infinite. In any case no difference for consumer between PY_SSIZE_T_MAX and a real infinity. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Jul 16, 2012, at 12:36 AM, Stefan Behnel wrote: I'd like to more visibly repeat my suggestion to make this a slot method tp_length_hint() in extension types that returns a Py_ssize_t. That is merely an implementation detail, but it would be a nice improvement. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Mon, Jul 16, 2012 at 3:36 AM, Stefan Behnel stefan...@behnel.de wrote: Alex Gaynor, 15.07.2012 00:11: CPython currently defines an ``__length_hint__`` method on several types, such as various iterators. This method is then used by various other functions (such as ``map``) to presize lists based on the estimated returned by ``__length_hint__``. Types can then define ``__length_hint__`` which are not sized, and thus should not define ``__len__``, but can estimate or compute a size (such as many iterators). Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. I'd like to more visibly repeat my suggestion to make this a slot method tp_length_hint() in extension types that returns a Py_ssize_t. That suggests that a negative return value would have a special meaning instead of relying on return values like NotImplemented. The Python wrapper of that slot method could still implement a mapping for this. Return values could be -1 for don't know and -2 for infinite at the C level, and NotImplemented for don't know at the Python level. Not sure about a good Python value for infinite. Gods no. Making the return value different in C vs. Python code is just asking for trouble in terms of having to remember that specific difference while coding. Plus asking for people to check for an explicit negative values instead of just = 0 would be problematic and prone to error. Maybe return -1 for infinite at both levels and -2/NotImplemented for don't know in C/Python? That would suggest -3 to propagate exceptions at the C level. See above. This is another reason why I don't think the infinite iterator concept is worth expressin. It's just mucking things up for no good reason. infinite == I don't know in the case of pre-allocation of a list. Just raise an exception or return None and be done with it. Nice and simple. And my vote is for an exception as EAFP. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On 7/16/2012 9:54 AM, Stefan Behnel wrote: Mark Shannon, 15.07.2012 16:14: Alex Gaynor wrote: CPython currently defines an ``__length_hint__`` method on several types, such as various iterators. This method is then used by various other functions (such as ``map``) to presize lists based on the estimated returned by Don't use map as an example. map returns an iterator so it doesn't need __length_hint__ Right. It's a good example for something else, though. As I mentioned before, iterators should be able to propagate the length hint of an underlying iterator, e.g. in generator expressions or map(). I consider that an important feature that the protocol must support. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mstefanro%40gmail.com map() is quite problematic in this matter, and may actually benefit from the existence of __length_hint__. It is very easy to create an infinite loop currently by doing stuff like x=[1]; x+=map(str,x) [61081 refs] x=[1]; x+=map(str,x) Traceback (most recent call last): ... MemoryError [120959834 refs] len(x) 120898752 Obviously, this won't cause an infinite loop in Python2 where map is non-lazy. Also, this won't work for all mutable containers, because not all of them permit adding elements while iterating: s=set([1]); s.update(map(str,s)) Traceback (most recent call last): ... RuntimeError: Set changed size during iteration [61101 refs] s {1, '1'} [61101 refs] del s [61099 refs] If map objects were to disallow changing the size of the container while iterating (I can't really think of an use-case in which such a limitation would be harmful), it might as well be with __length_hint__. Also, what would iter([1,2,3]).__length_hint__() return? 3 or unknown? If 3, then the semantics of l=[1,2,3]; l += iter(l) will change (infinite loop without __length_hint__ vs. list of 6 elements with __length_hint__). If unknown, then it doesn't seem like there are very many places where __length_hint__ can return anything but unknown. Regards, Stefan M ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
M Stefan wrote: Also, what would iter([1,2,3]).__length_hint__() return? 3 or unknown? If 3, then the semantics of l=[1,2,3]; l += iter(l) will change (infinite loop without __length_hint__ vs. list of 6 elements with __length_hint__). What __length_hint__ returns is irrelevant -- it's only a hint. Python will have to loop over all the items. So you would still get an infinite loop with the above code. ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
I've updated the PEP to reflect the discussion. There are two major changes: 1) NotImplemented may be used by __length_hint__ to indicate that there is no finite length hint available. 2) callers of operator.length_hint() must provide their own default value, this is also required by the current C _PyObject_LengthHint implementation. There are no provisions for infinite iterators, that is not within the scope of this proposal. Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Tue, Jul 17, 2012 at 1:03 PM, Alex Gaynor alex.gay...@gmail.com wrote: I've updated the PEP to reflect the discussion. There are two major changes: 1) NotImplemented may be used by __length_hint__ to indicate that there is no finite length hint available. I've been thinking about this a bit more, and I think it does provide good scope for eventually adding __length_hint__ to more iterators (including map, filter and enumerate). 2) callers of operator.length_hint() must provide their own default value, this is also required by the current C _PyObject_LengthHint implementation. And this makes it explicit that API users need to deal with the AttributeError/NotImplemented case, whilst making it easy to do so. Good call. There are no provisions for infinite iterators, that is not within the scope of this proposal. I'll repeat my observation that remaining silent on this point is effectively identical to blessing the practice of raising an exception in __length_hint__ to force fast failure of attempts to convert an infinite iterator to a concrete container. Rather than leaving people to figure this out on their own, we may as well make it official that TypeError can be raised in __length_hint__ to block conversion to concrete containers that use a preallocation strategy. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Nick Coghlan wrote: On Sun, Jul 15, 2012 at 9:18 AM, Benjamin Peterson benja...@python.org wrote: Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Let's try to keep this as limited as possible for a public API. Length hints are very useful for *any* container implementation, whether those containers are in the standard library or not. Just as we exposed operator.index when __index__ was added, we should expose an operator.length_hint function with the following semantics: [...] As given, length_hint gives no way of distinguishing between iterables and non-iterables: py length_hint([]) 0 py length_hint(42) 0 nor does it give iterable objects a way to indicate that either they don't know their length, or that they are infinite. I suggest: * object (and hence all other types that don't explicitly override it) should have a __length_hint__ that raises TypeError; * __length_hint__ should be allowed to return None to indicate don't know or -1 to indicate infinite. Presumably anything that wishes to create a list or other sequence from an object with a hint of -1 could then raise an exception immediately. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, Jul 15, 2012 at 6:21 PM, Steven D'Aprano st...@pearwood.info wrote: I suggest: * object (and hence all other types that don't explicitly override it) should have a __length_hint__ that raises TypeError; We can keep it simpler than that just by changing the order of the checks. * __length_hint__ should be allowed to return None to indicate don't know or -1 to indicate infinite. Presumably anything that wishes to create a list or other sequence from an object with a hint of -1 could then raise an exception immediately. I'm not seeing the value in returning None over 0 for the don't know case - it just makes the API harder to use. Declaring negative results as meaning I'm infinite sounds reasonable, though: def length_hint(obj): Return an estimate of the number of items in obj. This is useful for presizing containers when building from an iterable. If the object supports len(), the result will be exact. Otherwise, it may over or underestimate by an arbitrary amount. try: get_hint = obj.__length_hint__ except AttributeError: return len(obj) hint = get_hint() if not isinstance(hint, int): msg = Length hint must be an integer, not %r raise TypeError(msg % type(hint)) if hint 0: raise ValueError(%r is an infinite iterator % (obj,)) return hint Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Alex Gaynor, 15.07.2012 07:20: there's no way for the __lenght_hint__ to specify that that particular instance can't have a length hint computed. e.g. imagine some sort of lazy stream that cached itself, and only wanted to offer a length hint if it had already been evaluated. Without an exception to raise, it has to return whatever the magic value for length_hint is (in your impl it appears to be 0, the current _PyObject_LengthHint method in CPython has a required `default` parameter). The PEP proposes using TypeError for that. Yes, that's a major issue. I've been planning to add a length hint to Cython's generator expressions for a while, but the problem is really that in most cases it is only known at runtime if the underlying iterable has a length hint, so propagating it needs a way to say sorry, I thought I might know, but I don't. It would be even better if this way was efficient. Since we're at a point of making this an official protocol, why not change the current behaviour and return -1 (or even just 0) to explicitly state that we don't know? The problem with an exception here is that it might have been raised accidentally inside of the __length_hint__() implementation that is being asked. Swallowing it just because it happened to be a TypeError rather than something else may end up covering bugs. We had a similar issue with hasattr() in the past. Also, it would be nice if this became a type slot rather than requiring a dict lookup and Python function call. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, 15 Jul 2012 18:47:38 +1000 Nick Coghlan ncogh...@gmail.com wrote: * __length_hint__ should be allowed to return None to indicate don't know or -1 to indicate infinite. Presumably anything that wishes to create a list or other sequence from an object with a hint of -1 could then raise an exception immediately. I'm not seeing the value in returning None over 0 for the don't know case - it just makes the API harder to use. The point is that 0 is a legitimate value for a length hint. Simple implementations of __length_hint__ will start returning 0 as a legitimate value and you will wrongly interpret that as don't know, which kinds of defeat the purpose of __length-hint__ ;) That said, I don't think a special value for is infinite is useful. Just make -1 mean I don't know. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Antoine Pitrou wrote: The point is that 0 is a legitimate value for a length hint. Simple implementations of __length_hint__ will start returning 0 as a legitimate value and you will wrongly interpret that as don't know, which kinds of defeat the purpose of __length-hint__ ;) That said, I don't think a special value for is infinite is useful. Just make -1 mean I don't know. You've obviously never accidentally called list on an infinite iterator *wink* It's not the (eventual) MemoryError that is the problem. On some systems, this can cause the PC to become unresponsive as the OS tries to free an ever-increasing amount of memory. Been there, done that, on a production system. I had to do a hard reboot to fix it. I think having a hint that says there's no way this can succeed, fail immediately is more useful than caring about the difference between a hint of 0 and a hint of 1. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. I still haven't seen a use case where the appropriate response to I don't know differs from the appropriate response to a hint of zero - that is, you don't preallocate, you just start iterating. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Mon, 16 Jul 2012 00:08:41 +1000 Nick Coghlan ncogh...@gmail.com wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. Who would be able to return that, apart from trivial cases like itertools.cycle()? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Am 15.07.2012 16:22, schrieb Antoine Pitrou: On Mon, 16 Jul 2012 00:08:41 +1000 Nick Coghlan ncogh...@gmail.com wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. Who would be able to return that, apart from trivial cases like itertools.cycle()? For example most numerical sequence iterators like Fibonacci generator, prime number sequence generator and even trivial cases like even natural number generator. IMO it's a good idea to have a notation for infinitive iterators that can't be materialized as finite containers. +1 Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Nick Coghlan wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. I still haven't seen a use case where the appropriate response to I don't know differs from the appropriate response to a hint of zero - that is, you don't preallocate, you just start iterating. There seem to be 5 possible classes values of __length_hint__ that an iterator object can provide: 1. Don't implement it at all. 2. Implement __length_hint__() but don't want to return any value. Either raise an exception (TypeError) -- As suggested in the PEP. or return NotImplemented -- my preferred option. 3. Return a don't know value: Returning 0 would be fine for this, but the VM might want to respond differently to don't know and 0. __length_hint__() == 0 container should be minimum size. __length_hint__() == unknown container starts at default size. 4. Infinite iterator: Could return float('inf'), but given this is a hint then returning sys.maxsize or sys.maxsize + 1 might be OK. Alternatively raise an OverflowError 5. A meaningful length. No problem :) Also, what are the allowable return types? 1. int only 2. Any number (ie any type with a __int__() method)? 3. Or any integer-like object (ie a type with a __index__() method)? My suggestion: a) Don't want to return any value or don't know: return NotImplemented b) For infinite iterators: raise an OverflowError c) All other cases: return an int or a type with a __index__() method. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Alex Gaynor wrote: Hi all, I've just submitted a PEP proposing making __length_hint__ a public API for users to define and other VMs to implement: These seems back-to-front. __length_hint__ is *used* by the VM, not provided by it. It should be part of the object model, rather than the API. PEP: 424 Title: A method for exposing a length hint Version: $Revision$ Last-Modified: $Date Author: Alex Gaynor alex.gay...@gmail.com Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 14-July-2012 Python-Version: 3.4 Abstract CPython currently defines an ``__length_hint__`` method on several types, such as various iterators. This method is then used by various other functions (such as ``map``) to presize lists based on the estimated returned by Don't use map as an example. map returns an iterator so it doesn't need __length_hint__ ``__length_hint__``. Types can then define ``__length_hint__`` which are not sized, and thus should not define ``__len__``, but can estimate or compute a size (such as many iterators). Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. Rather than raising a TypeError, why not return NotImplemented? Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Copyright = This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, Jul 15, 2012 at 10:39 AM, Mark Shannon m...@hotpy.org wrote: Nick Coghlan wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. I still haven't seen a use case where the appropriate response to I don't know differs from the appropriate response to a hint of zero - that is, you don't preallocate, you just start iterating. There seem to be 5 possible classes values of __length_hint__ that an iterator object can provide: 1. Don't implement it at all. 2. Implement __length_hint__() but don't want to return any value. Either raise an exception (TypeError) -- As suggested in the PEP. or return NotImplemented -- my preferred option. 3. Return a don't know value: Returning 0 would be fine for this, but the VM might want to respond differently to don't know and 0. __length_hint__() == 0 container should be minimum size. __length_hint__() == unknown container starts at default size. 4. Infinite iterator: Could return float('inf'), but given this is a hint then returning sys.maxsize or sys.maxsize + 1 might be OK. Alternatively raise an OverflowError I am really having a hard time differentiating infinity with I don't know since they are both accurate from the point of view of __length_hint__ and its typical purpose of allocation. You have no clue how many values will be grabbed from an infinite iterator, so it's the same as just not knowing upfront how long the iterator will be, infinite or not, and thus not worth distinguishing. 5. A meaningful length. No problem :) Also, what are the allowable return types? 1. int only 2. Any number (ie any type with a __int__() method)? 3. Or any integer-like object (ie a type with a __index__() method)? My suggestion: a) Don't want to return any value or don't know: return NotImplemented b) For infinite iterators: raise an OverflowError c) All other cases: return an int or a type with a __index__() method. I'm fine with (a), drop (b), and for (c) use what we allow for __len__() since, as Nick's operator.length_hint pseudo-code suggests, people will call this as a fallback if __len__ isn't defined. -Brett Cheers, Mark. __**_ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** brett%40python.orghttp://mail.python.org/mailman/options/python-dev/brett%40python.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Am 15.07.2012 16:39, schrieb Mark Shannon: 1. Don't implement it at all. 2. Implement __length_hint__() but don't want to return any value. Either raise an exception (TypeError) -- As suggested in the PEP. or return NotImplemented -- my preferred option. How is this different from don't know? What's the use case for knowing that the object doesn't want to say anything or doesn't know its possible length. 3. Return a don't know value: Returning 0 would be fine for this, but the VM might want to respond differently to don't know and 0. How about None? It's the logical choice, simple and easy to test for in Python and C code. 0 is a valid number for I know that's I'll return nothing. 4. Infinite iterator: Could return float('inf'), but given this is a hint then returning sys.maxsize or sys.maxsize + 1 might be OK. Alternatively raise an OverflowError Too complex, hard to remember and even harder to check for. Since a length is always positive or zero, -1 is a good return value for infinite. a) Don't want to return any value or don't know: return NotImplemented +1 b) For infinite iterators: raise an OverflowError -1, I'm for -1. ;) I'm not a fan of using exception for valid and correct return values. c) All other cases: return an int or a type with a __index__() method. +1 Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Brett Cannon wrote: On Sun, Jul 15, 2012 at 10:39 AM, Mark Shannon m...@hotpy.org mailto:m...@hotpy.org wrote: Nick Coghlan wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. I still haven't seen a use case where the appropriate response to I don't know differs from the appropriate response to a hint of zero - that is, you don't preallocate, you just start iterating. There seem to be 5 possible classes values of __length_hint__ that an iterator object can provide: 1. Don't implement it at all. 2. Implement __length_hint__() but don't want to return any value. Either raise an exception (TypeError) -- As suggested in the PEP. or return NotImplemented -- my preferred option. 3. Return a don't know value: Returning 0 would be fine for this, but the VM might want to respond differently to don't know and 0. __length_hint__() == 0 container should be minimum size. __length_hint__() == unknown container starts at default size. 4. Infinite iterator: Could return float('inf'), but given this is a hint then returning sys.maxsize or sys.maxsize + 1 might be OK. Alternatively raise an OverflowError I am really having a hard time differentiating infinity with I don't know since they are both accurate from the point of view of __length_hint__ and its typical purpose of allocation. You have no clue how many values will be grabbed from an infinite iterator, so it's the same as just not knowing upfront how long the iterator will be, infinite or not, and thus not worth distinguishing. 5. A meaningful length. No problem :) Also, what are the allowable return types? 1. int only 2. Any number (ie any type with a __int__() method)? 3. Or any integer-like object (ie a type with a __index__() method)? My suggestion: a) Don't want to return any value or don't know: return NotImplemented b) For infinite iterators: raise an OverflowError c) All other cases: return an int or a type with a __index__() method. I'm fine with (a), drop (b), and for (c) use what we allow for __len__() since, as Nick's operator.length_hint pseudo-code suggests, people will call this as a fallback if __len__ isn't defined. So how does an iterator express infinite length? What should happen if I am silly enough to do this: list(itertools.count()) This will fail; it should fail quickly. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, 15 Jul 2012 16:33:23 +0200 Christian Heimes li...@cheimes.de wrote: Am 15.07.2012 16:22, schrieb Antoine Pitrou: On Mon, 16 Jul 2012 00:08:41 +1000 Nick Coghlan ncogh...@gmail.com wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. Who would be able to return that, apart from trivial cases like itertools.cycle()? For example most numerical sequence iterators like Fibonacci generator, prime number sequence generator and even trivial cases like even natural number generator. First, you can't implement __length_hint__ for a generator, which is the preferred (the most practical) way of writing iterators in pure Python. Second, not all iterators will implement __length_hint__ (because it's optional and, really, of rather little use). So, as a user, you cannot hope that `list(some_iterator)` will always raise instead of filling your memory with an infinite stream of values: you have to be careful anyway. Even if __length_hint__ is implemented, its result may be wrong. That's the whole point: it's a *hint*; an iterator might tell you it's finite while it's infinite, or the reverse. My conclusion is that an infinite iterator is a documentation issue. Just tell the user that it doesn't stop, and let them shoot themselves in the foot in they want to. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, 15 Jul 2012 16:08:00 +0100 Mark Shannon m...@hotpy.org wrote: What should happen if I am silly enough to do this: list(itertools.count()) This will fail; it should fail quickly. Why should it? AFAIK it's not a common complaint. You said it yourself: it's a silly thing to do. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, Jul 15, 2012 at 8:08 AM, Mark Shannon m...@hotpy.org wrote: Brett Cannon wrote: On Sun, Jul 15, 2012 at 10:39 AM, Mark Shannon m...@hotpy.org mailto:m...@hotpy.org wrote: Nick Coghlan wrote: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. I still haven't seen a use case where the appropriate response to I don't know differs from the appropriate response to a hint of zero - that is, you don't preallocate, you just start iterating. There seem to be 5 possible classes values of __length_hint__ that an iterator object can provide: 1. Don't implement it at all. 2. Implement __length_hint__() but don't want to return any value. Either raise an exception (TypeError) -- As suggested in the PEP. or return NotImplemented -- my preferred option. 3. Return a don't know value: Returning 0 would be fine for this, but the VM might want to respond differently to don't know and 0. __length_hint__() == 0 container should be minimum size. __length_hint__() == unknown container starts at default size. 4. Infinite iterator: Could return float('inf'), but given this is a hint then returning sys.maxsize or sys.maxsize + 1 might be OK. Alternatively raise an OverflowError I am really having a hard time differentiating infinity with I don't know since they are both accurate from the point of view of __length_hint__ and its typical purpose of allocation. You have no clue how many values will be grabbed from an infinite iterator, so it's the same as just not knowing upfront how long the iterator will be, infinite or not, and thus not worth distinguishing. 5. A meaningful length. No problem :) Also, what are the allowable return types? 1. int only 2. Any number (ie any type with a __int__() method)? 3. Or any integer-like object (ie a type with a __index__() method)? My suggestion: a) Don't want to return any value or don't know: return NotImplemented b) For infinite iterators: raise an OverflowError c) All other cases: return an int or a type with a __index__() method. I'm fine with (a), drop (b), and for (c) use what we allow for __len__() since, as Nick's operator.length_hint pseudo-code suggests, people will call this as a fallback if __len__ isn't defined. So how does an iterator express infinite length? What should happen if I am silly enough to do this: list(itertools.count()) This will fail; it should fail quickly. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexandre.zani%40gmail.com The PEP so far says: It may raise a ``TypeError`` if a specific instance cannot have its length estimated. In many ways, I don't know is the same as this specific instance cannot have its length estimated. Why not just raise a TypeError? Also, regarding the code Nick posted above, I'm a little concerned about calling len as the first thing to try. That means that if I implement both __len__ and __len_hint__ (perhaps because __len__ is very expensive) __len_hint__ will never be used. It's relatively easy to say: try: hint = len_hint(l) except TypeError: hint = len(l) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Mark Shannon wrote: So how does an iterator express infinite length? The suggestion was it should return -1. What should happen if I am silly enough to do this: list(itertools.count()) This will fail; it should fail quickly. That depends on your OS. I've just tested it now on Linux Mint, and the Python process was terminated within seconds. I've also inadvertently done it on a Fedora system, which became completely unresponsive to user-input (including ctrl-alt-delete) within a few minutes. I let it run overnight (16 hours) before literally pulling the plug. (I expect the difference in behaviour is due to the default ulimit under Debian/Mint and RedHat/Fedora systems.) Ignoring OS-specific features, the promise[1] of the language is that list will try to allocate enough space for every item yielded by the iterator, or fail with a MemoryError. No promise is made as to how long that will take: it could take hours, or days, depending on how badly memory allocation performance drops when faced with unreasonably large requests. You can't expect it to fail either quickly or with an exception. With a length hint, we could strengthen that promise: if __length_hint__ returns a negative number, list, tuple and set will fail immediately with MemoryError which I think is a good safety feature for some things which cannot possibly succeed, but risk DOSing your system. Does it prevent every possible failure mode? No, of course not. But just because you can't prevent *every* problem doesn't mean you should prevent the ones which you can. [1] I think. I'm sure I read this somewhere in the docs, but I can't find it now. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Mon, Jul 16, 2012 at 1:55 AM, Steven D'Aprano st...@pearwood.info wrote: (I expect the difference in behaviour is due to the default ulimit under Debian/Mint and RedHat/Fedora systems.) Possibly also virtual memory settings. Allocating gobs of memory with a huge page file slows everything down without raising an error. And since it's possible to have non-infinite but ridiculous-sized iterators, I'd not bother putting too much effort into protecting infinite iterators - although the huge but not infinite case is, admittedly, rather rarer than either reasonable-sized or actually infinite. ChrisA ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Antoine Pitrou wrote: First, you can't implement __length_hint__ for a generator, which is the preferred (the most practical) way of writing iterators in pure Python. Limitations of generators are no reason for not improving iterators which are not generators. __length_hint__ already exists; this proposal simply proposes making it documented and officially supported. py iter([]).__length_hint__ built-in method __length_hint__ of list_iterator object at 0xb7bcf98c Even if __length_hint__ is implemented, its result may be wrong. That's the whole point: it's a *hint*; an iterator might tell you it's finite while it's infinite, or the reverse. If it claims to be infinite, I see no reason to disbelieve it on the off-chance that it is actually both finite and small enough to fit into memory without crashing my system. If it claims to be finite, but is actually infinite, well that's not much of a hint, is it? There's an implied promise that the hint will be close to the real value, not infinitely distant. My conclusion is that an infinite iterator is a documentation issue. Just tell the user that it doesn't stop, and let them shoot themselves in the foot in they want to. Buffer overflows are a documentation issue. Just tell the user not to overwrite memory they don't mean to, and let them shoot themselves in the foot if they want. *wink* -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Mon, 16 Jul 2012 02:00:58 +1000 Chris Angelico ros...@gmail.com wrote: On Mon, Jul 16, 2012 at 1:55 AM, Steven D'Aprano st...@pearwood.info wrote: (I expect the difference in behaviour is due to the default ulimit under Debian/Mint and RedHat/Fedora systems.) Possibly also virtual memory settings. Allocating gobs of memory with a huge page file slows everything down without raising an error. And since it's possible to have non-infinite but ridiculous-sized iterators, I'd not bother putting too much effort into protecting infinite iterators - although the huge but not infinite case is, admittedly, rather rarer than either reasonable-sized or actually infinite. In the real world, I'm sure huge but not infinite is much more frequent than actually infinite. Trying to list() an infinite iterator is a programming error, so it shouldn't end up in production code. However, data that grows bigger than expected (or that gets disposed of too late) is quite a common thing. hint When hg.python.org died of OOM two weeks ago, it wasn't because of an infinite iterator: http://mail.python.org/pipermail/python-committers/2012-July/002084.html /hint Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Mon, 16 Jul 2012 02:21:20 +1000 Steven D'Aprano st...@pearwood.info wrote: My conclusion is that an infinite iterator is a documentation issue. Just tell the user that it doesn't stop, and let them shoot themselves in the foot in they want to. Buffer overflows are a documentation issue. Just tell the user not to overwrite memory they don't mean to, and let them shoot themselves in the foot if they want. No, buffer overflows are bugs and they get fixed. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Steven D'Aprano wrote: With a length hint, we could strengthen that promise: if __length_hint__ returns a negative number, list, tuple and set will fail immediately with MemoryError which I think is a good safety feature for some things which cannot possibly succeed, but risk DOSing your system. Does it prevent every possible failure mode? No, of course not. But just because you can't prevent *every* problem doesn't mean you should prevent the ones which you can. Gah, I messed that last sentence up. It should read: just because you can't prevent *every* problem doesn't mean you SHOULDN'T prevent the ones which you can. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Nick Coghlan writes: Right, I agree on the value in being able to return something to say this cannot be converted to a concrete container. I still haven't seen a use case where the appropriate response to I don't know differs from the appropriate response to a hint of zero - that is, you don't preallocate, you just start iterating. Why wouldn't one just believe the hint and jump past the iteration? What about an alternative API such as length_hint(iter, bound) returning 'cannot say' (if no hint is available), 'small' (if the estimated length is less than bound), and 'large' (if it's greater than the bound or infinite)? (Or None, True, False which would give the boolean interpretation do I know I'm small enough to be converted to a concrete container?) The point is that I don't really see the value in returning a precise estimate that cannot be relied on to be accurate. OK, Python is a consenting adults language, but returning an integer here seems like invitation to abuse. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Jul 16, 2012 1:52 PM, Stephen J. Turnbull step...@xemacs.org wrote: The point is that I don't really see the value in returning a precise estimate that cannot be relied on to be accurate. OK, Python is a consenting adults language, but returning an integer here seems like invitation to abuse. Because preallocating memory is ridiculously faster than doing multiple resizes. That's all this API is for: how many objects should a container constructor preallocate space for when building from an iterable. It's an important optimisation in CPython when using itertools, and PyPy is planning to adopt it as well. Alex is doing the right thing in attempting to standardise it rather than risk the two implementations using subtly incompatible definitions. Skipping the iteration in the zero case is a pointless micro-optimisation that just makes the API more complex for no good reason. Allowing a negative hint to mean infinite, on the other hand, avoids certain categories of errors without making the API any harder to use (since negatives have to be rejected anyway). -- Sent from my phone, thus the relative brevity :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
Stephen J. Turnbull wrote: The point is that I don't really see the value in returning a precise estimate that cannot be relied on to be accurate. OK, Python is a consenting adults language, but returning an integer here seems like invitation to abuse. Since __length_hint__ already exists and is already used, we should probably hear from somebody who knows how it is used and what problems and/or benefits it leads to. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 0424: A method for exposing a length hint
Hi all, I've just submitted a PEP proposing making __length_hint__ a public API for users to define and other VMs to implement: PEP: 424 Title: A method for exposing a length hint Version: $Revision$ Last-Modified: $Date Author: Alex Gaynor alex.gay...@gmail.com Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 14-July-2012 Python-Version: 3.4 Abstract CPython currently defines an ``__length_hint__`` method on several types, such as various iterators. This method is then used by various other functions (such as ``map``) to presize lists based on the estimated returned by ``__length_hint__``. Types can then define ``__length_hint__`` which are not sized, and thus should not define ``__len__``, but can estimate or compute a size (such as many iterators). Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Copyright = This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
2012/7/14 Alex Gaynor alex.gay...@gmail.com: Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. And what happens if you return a negative value? Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Let's try to keep this as limited as possible for a public API. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sat, Jul 14, 2012 at 4:18 PM, Benjamin Peterson benja...@python.orgwrote: 2012/7/14 Alex Gaynor alex.gay...@gmail.com: Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. And what happens if you return a negative value? ValueError, the same as with len. Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Let's try to keep this as limited as possible for a public API. Sounds reasonable to me! Should we just go ahead and strip those out now? -- Regards, Benjamin Alex -- I disapprove of what you say, but I will defend to the death your right to say it. -- Evelyn Beatrice Hall (summarizing Voltaire) The people's good is the highest law. -- Cicero ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sat, Jul 14, 2012 at 4:21 PM, Alex Gaynor alex.gay...@gmail.com wrote: On Sat, Jul 14, 2012 at 4:18 PM, Benjamin Peterson benja...@python.org wrote: 2012/7/14 Alex Gaynor alex.gay...@gmail.com: Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. And what happens if you return a negative value? ValueError, the same as with len. Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Let's try to keep this as limited as possible for a public API. Sounds reasonable to me! Should we just go ahead and strip those out now? I'm +1 on not having a public API for this. Ultimately the contract for a length hint will depend heavily upon what you need it for. Some applications would require a length hint to be an at least others an at most and others something else entirely. Given that the contract here appears to be =0, I don't think the length hint is particularly useful to the public at large. -- Regards, Benjamin Alex -- I disapprove of what you say, but I will defend to the death your right to say it. -- Evelyn Beatrice Hall (summarizing Voltaire) The people's good is the highest law. -- Cicero ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexandre.zani%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
2012/7/14 Alex Gaynor alex.gay...@gmail.com: On Sat, Jul 14, 2012 at 4:18 PM, Benjamin Peterson benja...@python.org wrote: 2012/7/14 Alex Gaynor alex.gay...@gmail.com: Proposal This PEP proposes formally documenting ``__length_hint__`` for other interpreter and non-standard library Python to implement. ``__length_hint__`` must return an integer, and is not required to be accurate. It may return a value that is either larger or smaller than the actual size of the container. It may raise a ``TypeError`` if a specific instance cannot have its length estimated. It may not return a negative value. And what happens if you return a negative value? ValueError, the same as with len. CPython will probably have to updated to not ignore it if you return melons. Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Let's try to keep this as limited as possible for a public API. Sounds reasonable to me! Should we just go ahead and strip those out now? Certainly. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On 7/14/2012 6:11 PM, Alex Gaynor wrote: ... Various thoughts: This method is then used by various other functions (such +as ``map``) to presize lists -- map no longer produces lists. This only makes sense in 3.x if you mean that map can pass along the value of its inputs. Types can then define ``__length_hint__`` which are not +sized, and thus should not define ``__len__``, is awkwardly phrased. I think you mean Types that are not sized and should not define __len__ can then define __length_hint__. What do 'sized' and 'should' mean? Some iterators know exactly how many items they have yet to yield. The main implication of having a __len__ versus __length_hint__ methods seems to be it bool() value when empty. If lists were to get a new keyword arg, so should the other classes based on one internal array. I see this has been removed. Generator functions are the nicest way to define iterators in Python. Generator instances returned from generator functions cannot be given a length hint. They are not directly helped. However ... Not addressed in the PEP: do consumers of __length_hint look for it (and __length__ before or after calling iter(input), or both? If before, then the following should work. class gwlh: # generator with length hint def __init__(self, gen, len): self.gen = gen self.len = len def __iter__(self): return self.gen def __length_hint__(self): return len Do transformation iterators pass through hints from inputs? Does map(f, iterable) look for len or hint on iterable? Ditto for some itertools, like chain (add lengths). Any guidelines in the PEP -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sun, Jul 15, 2012 at 9:18 AM, Benjamin Peterson benja...@python.org wrote: Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Let's try to keep this as limited as possible for a public API. Length hints are very useful for *any* container implementation, whether those containers are in the standard library or not. Just as we exposed operator.index when __index__ was added, we should expose an operator.length_hint function with the following semantics: def length_hint(obj): Return an estimate of the number of items in obj. This is useful for presizing containers when building from an iterable. If the object supports len(), the result will be exact. Otherwise, it may over or underestimate by an arbitrary amount. The result will be an integer = 0. try: return len(obj) except TypeError: try: get_hint = obj.__length_hint__ except AttributeError: return 0 hint = get_hint() if not isinstance(hint, int): raise TypeError(Length hint must be an integer, not %r % type(hint)) if hint 0: raise ValueError(Length hint (%r) must be = 0 % hint) return hint There's no reason to make pure Python container implementations reimplement all that for themselves. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 0424: A method for exposing a length hint
On Sat, Jul 14, 2012 at 10:16 PM, Nick Coghlan ncogh...@gmail.com wrote: On Sun, Jul 15, 2012 at 9:18 AM, Benjamin Peterson benja...@python.org wrote: Open questions == There are two open questions for this PEP: * Should ``list`` expose a kwarg in it's constructor for supplying a length hint. * Should a function be added either to ``builtins`` or some other module which calls ``__length_hint__``, like ``builtins.len`` calls ``__len__``. Let's try to keep this as limited as possible for a public API. Length hints are very useful for *any* container implementation, whether those containers are in the standard library or not. Just as we exposed operator.index when __index__ was added, we should expose an operator.length_hint function with the following semantics: def length_hint(obj): Return an estimate of the number of items in obj. This is useful for presizing containers when building from an iterable. If the object supports len(), the result will be exact. Otherwise, it may over or underestimate by an arbitrary amount. The result will be an integer = 0. try: return len(obj) except TypeError: try: get_hint = obj.__length_hint__ except AttributeError: return 0 hint = get_hint() if not isinstance(hint, int): raise TypeError(Length hint must be an integer, not %r % type(hint)) if hint 0: raise ValueError(Length hint (%r) must be = 0 % hint) return hint There's no reason to make pure Python container implementations reimplement all that for themselves. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia Sounds reasonable to me, the only issue with your psuedocode (err... I mean Python ;)), is that there's no way for the __lenght_hint__ to specify that that particular instance can't have a length hint computed. e.g. imagine some sort of lazy stream that cached itself, and only wanted to offer a length hint if it had already been evaluated. Without an exception to raise, it has to return whatever the magic value for length_hint is (in your impl it appears to be 0, the current _PyObject_LengthHint method in CPython has a required `default` parameter). The PEP proposes using TypeError for that. Anyways that code looks good, do you want to add it to the PEP? Alex -- I disapprove of what you say, but I will defend to the death your right to say it. -- Evelyn Beatrice Hall (summarizing Voltaire) The people's good is the highest law. -- Cicero ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com