[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2019-06-27 Thread STINNER Victor


STINNER Victor  added the comment:

The main issue with using length hint is that calling a method is Python is not 
free. Calling the method may add more overhead than the speedup provided by 
smarter memory allocations.

The worst case for this optimization should be measured on very small data set, 
of less than 10 items. We don't want to make Python smaller for operations on 5 
items for example.

The idea which remains open is the idea of adding a "slot" for length hint in 
PyTypeObject. But I would suggest to open a separated issue to explore this 
idea.

Slots allows to use C functions at raw speed and so reduce the cost of getting 
the length hint.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2019-06-27 Thread Nicholas Musolino


Nicholas Musolino  added the comment:

Before seeing this issue and its closed status, I created PR 14432, which adds 
`__length_hint__()` to the iterator returned by builtin `map`.

This PR differs from the original 2017 PR by MSeifert in that the code can 
distinguish between the cases where a length hint (or length) function is not 
available, versus a case in which the `__length_hint__()` method of the 
user-supplied iterable raises an exception.

That is, the new PR propagates exceptions (other than TypeError, per PEP 424) 
raised in the `__length_hint__()` functions of the user-supplied iterators.

I've read the comments in this issue, and the notes in `test_iterlen.py`:

> [Other] iterators [...], such as enumerate and the other itertools,
> are not length transparent because they have no way to distinguish between
> iterables that report static length and iterators whose length changes with
> each call (i.e. the difference between enumerate('abc') and
> enumerate(iter('abc')).

Can anyone provide a concrete example that illustrates the inherent 
difficulties of computing a length hint for `map`?  

In ad-hoc testing, the current PR handles situations listed there, and I 
haven't come across some fundamental show-stopper problem.

There are two limitations to the current PR:

1.  The case where a single iterator is supplied as multiple arguments to 
`map`, as pointed out by MSeifert:  
it = iter([1,2,3,4])
map_it = map(f, it, it)  
2.  When a user-supplied `__length_hint__()` method returns a non-integer, it 
results in a TypeError in an *inner* evaluation of `PyObject_LengthHint()` (per 
PEP 424).  When this exception reaches an *outer* evaluation of 
`PyObject_LengthHint()`, it is cleared (per PEP 424), and leads to 
operator.length_hint's default value being returned.

If we consider issue 2 to be incorrect,  I think I could fix it by raising 
RuntimeError from the original TypeError, or by somewhat invasive refactoring 
of `PyObject_LengthHint()` (which I do not recommend).

--
nosy: +nmusolino

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2019-06-27 Thread Nicholas Musolino


Change by Nicholas Musolino :


--
pull_requests: +14249
pull_request: https://github.com/python/cpython/pull/14432

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-26 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-26 Thread Michael Seifert

Michael Seifert added the comment:

> zip.__length_hint__() must return NotImplemented or raise TypeError if any of 
> iterators don't implement __length_hint__ or its __length_hint__() returns 
> NotImplemented or raises TypeError.

> And what should return zip(range(3), range(2**1000)).__length_hint__()? I 
> expect 3, not OverflowError.

That's actually non-trivial because PyObject_LengthHint just returns a 
Py_ssize_t. To recover NotImplemented will be complicated and there's no way to 
discriminate if the OverflowError happened in PyObject_LengthHint or in the 
called __length_hint__. 

But TypeError is correctly re-raised in the tests I made.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-26 Thread Michael Seifert

Michael Seifert added the comment:

> I would like to mark this tracker item as closed.  IMO it is a dead-end. 

Yes, even some (not very uncommon cases) give incorrect results:

it = iter([1,2,3])
zip(it, it, it)

has length_hint 3 but will only yield one item.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

zip.__length_hint__() must return NotImplemented or raise TypeError if any of 
iterators don't implement __length_hint__ or its __length_hint__() returns 
NotImplemented or raises TypeError.

And what should return zip(range(3), range(2**1000)).__length_hint__()? I 
expect 3, not OverflowError.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Also, the ifilter() suggestion is a complete non-starter.  There is no way to 
know in advance whether filter will return all the elements of the input or 
none of them.  In the absence of other knowledge, the best strategy is what 
list.append() already does (a strategy of mild over-allocations, not exceeding 
12.5% for large lists).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

> But isn't that the point of the length_hint?

Where it could be used reliably, it was used and used broadly.  However, there 
are circumstances (iterator vs iterable) where it can't know how to make an 
estimate and is perhaps wildly off.

I would like to mark this tracker item as closed.  IMO it is a dead-end. 

Like you, I share enthusiasm for length_hint and its potential (that I why I 
created the concept many years ago and worked very hard to apply it everywhere 
it would fit).  But trust me, it doesn't fit everywhere.  I left it out of 
imap() for a reason (it just didn't make sense).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-25 Thread Michael Seifert

Michael Seifert added the comment:

> I explored that notion of iterator length transparency years ago.  While I 
> don't remember all the details, I did record some notes at the top of 
> Lib/test/test_iterlen.py.

But isn't that the point of the length_hint? To provide an *estimate* for the 
length? So generally I would expect the length_hint to be accurate (at least 
accurate enough to avoid reallocations) for well-behaved iterators. And I think 
that's possible for "zip" and "map" (but also for several itertools: product, 
permutations, combinations, islice, accumulate, starmap, zip_longest). At least 
in theory.

Also it would allow to prohibit infinite iterators to be passed to 
"length_hint"-aware functions. For example `list(count())` could raise an 
exception when `count.length_hint` would return math.inf or sys.maxsize + 1.

The pull request was just because I was curious how it performed and wanted to 
share the code. Feel free to reject it. The performance improvement wasn't 
amazing in the micro-benchmarks.

--
nosy: +MSeifert

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-16 Thread Raymond Hettinger

Raymond Hettinger added the comment:

[STINNER Victor]
> Oh, there is no slot for __length_hint__(). 
> Maybe we should also try to add a new slot for it?

+1

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-10 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Length hints were intentionally omitted from the Python 3 map() and filter() 
which used to be in the itertools module.   I explored that notion of iterator 
length transparency years ago.  While I don't remember all the details, I did 
record some notes at the top of Lib/test/test_iterlen.py.

BTW, I also don't think the list(map(...)) or list(filter(...)) case is worth 
optimizing.  For big inputs, manifesting the whole input into a list is usually 
(but not always) the wrong thing to do if you care about performance (throwing 
away the iterator's superb L1 and L2 cache performance and throwing away the 
memory efficiency).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-10 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
assignee:  -> rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2017-04-10 Thread Michael Seifert

Changes by Michael Seifert :


--
pull_requests: +1220

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2016-04-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

map and zip should use the minimum, zip_longest should use the maximum, and 
filter shouldn't have __length_hint__().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2016-04-23 Thread STINNER Victor

STINNER Victor added the comment:

I checked Python 2 for map(): it gets the length hint of all iterables and
use the maximum, with a default of 8. I'm not sure of what you mean by
errors, the function to get the length hint already catchs and ignores
errors no?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2016-04-22 Thread Terry J. Reedy

Terry J. Reedy added the comment:

Victor, are you suggesting the following?  "If map has a single iterable, call 
and return as hint, otherwise give up."  Or something else?

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2016-04-22 Thread STINNER Victor

STINNER Victor added the comment:

> But note that map() and zip() take several iterables, and we should call 
> __length_hint__() for every of them (unless found a one with not implemented 
> __length_hint__()). This can slow down the execution for short sequences.

Oh, there is no slot for __length_hint__(). Maybe we should also try to add a 
new slot for it?

Maybe we can use a fast-path for the most common cases like list(map(func, 
list))?


> It is impossible to implement reasonable filter.__length_hint__(), because 
> the length of resulting sequence can be from 0 to the length of the input 
> sequence, and returning the maximal value would be not correct.

What I see is that Python 2 is much faster and Python 2 reallocates N items if 
the input sequence contains N items. But yeah, we have to benchmark such change 
on Python 3 ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2016-04-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

See also issue14126.

It makes sense to implement map.__length_hint__() and  zip.__length_hint__(). 
But note that map() and zip() take several iterables, and we should call 
__length_hint__() for every of them (unless found a one with not implemented 
__length_hint__()). This can slow down the execution for short sequences.

It is impossible to implement reasonable filter.__length_hint__(), because the 
length of resulting sequence can be from 0 to the length of the input sequence, 
and returning the maximal value would be not correct.

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26828] Implement __length_hint__() on map() and filter() to optimize list(map) and list(filter)

2016-04-22 Thread STINNER Victor

New submission from STINNER Victor:

When I compared the performance of filter() and map() between Python 2.7 and 
3.4, I noticed a huge performance drop in Python 3!
http://bugs.python.org/issue26814#msg264003

I didn't analyze yet exactly why Python 3 is so much slower (almost 100% slower 
for the case of fitler!). Maybe it's because filter() returns a list on Python 
2, whereas filter() returns an iterator on Python 3.

In Python 2, filter() and map() use _PyObject_LengthHint(seq, 8) to create the 
result list. Why don't we do the same in Python 3?

filter.__length_hint__() and map.__length_hint__() would return 
seq.__length_hint__() of the input sequence, or return 8. It's a weak 
estimation, but it can help a lot of reduce the number of realloc() when the 
list is slowly growing.

See also the PEP 424 -- A method for exposing a length hint.

Note: the issue #26814 (fastcall) does make filter() and map() faster on Python 
3.6 compared to Python 2.7, but it's not directly related to this issue. IMHO 
using length hint would make list(filter) and list(map) even faster.

--
messages: 264007
nosy: alex, haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Implement __length_hint__() on map() and filter() to optimize list(map) 
and list(filter)
type: performance
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com