On Mon, Nov 26, 2018 at 01:29:21PM -0800, Kale Kundert wrote:
> I just ran into the following behavior, and found it surprising:
> 
> >>> len(map(float, [1,2,3]))
> TypeError: object of type 'map' has no len()
> 
> I understand that map() could be given an infinite sequence and therefore 
> might
> not always have a length.  But in this case, it seems like map() should've 
> known
> that its length was 3.

This seems straightforward, but I think there's more complexity than you 
might realise, a nasty surprise which I expect is going to annoy people 
no matter what decision we make, and the usefulness is probably less 
than you might think.

First, the usefulness: we still have to wrap the call to 
len() in a try...except block, even if we know we have a map object, 
because we won't know whether the underlying iterable supports len. So 
it won't reduce the amount of code we have to write. At best it will 
allow us to take a fast-path when len() returns a value, and a slow-path 
when it raises.

Here's the definition of the Sized abc:

https://docs.python.org/3/library/collections.abc.html#collections.abc.Sized

and the implementation simply checks for the existence of __len__. We 
(rightly) assume that if __len__ exists, the object has a known length, 
and that calling len() on it will succeed or at least not raise 
TypeError.

Your proposal will break that expectation. map objects will be sized, 
but since sometimes the underlying iterator won't be, they may still 
raise TypeError.

Of course there are ways to work around this. We could just change our 
expectations: even Sized objects might not be *actually* sized. Or map() 
could catch the TypeError and raise instead a ValueError, or something. 
Or we could rethink the whole length concept (see below), which after 
all was invented back in Python 1 days and is looking a bit old.

As for the nasty surprise... do you agree that this ought to be an 
invariant for sized iterables?

count = len(it)
i = 0
for obj in it:
    i += 1
assert i == count


That's the invariant I expect, and breaking that will annoy me (and I 
expect many other people) greatly.

But that means that map() cannot just delegate its length to the 
underlying iterable. The implementation must be more complex, keeping 
track of how many items it has seen.

And consider this case:

it = map(lambda x: x, [1, 2, 3, 4, 5])
x = next(it)
x = next(it)

assert len(it) == 5  # underlying length of the iterable
assert len(list(it)) == 3  # but only three items left

assert len(it) == 5  # still 5
assert len(list(it)) == 0  # but nothing left


So the length of the iterable has to vary as you iterate over it, or you 
break the invariant shown above.

But that's going to annoy other people for another reason: we rightly 
expect that iterables shouldn't change their length just because you 
iterate over them! The length should only change if you *modify* them. 
So these two snippets should do the same:

# 1
n = len(it)
x = sum(it)

# 2
x = sum(it)
n = len(it)

but if map() updates its length as it goes, it will break that 
invariant.

So *whichever* behaviour we choose, we're going to break *something*. 
Either the reported length isn't necessarily the same as the actual 
length you get from iterating over the items, which will be annoying and 
confusing, or it varies as you iterate, which will ALSO be annoying and 
confusing.

Either way, this apparently simple and obvious change will be annoying 
and confusing.



Rethinking object length
------------------------

len() was invented back in Python 1 days, or earlier, when we 
effectively had only one kind of iterable: sequences like lists, with a 
known length. Today, iterables can have:

1. a known, finite length;
2. a known infinite length;
3. An unknown length (and usually no way to estimate it).

At least. The len() protocol is intentionally simple, it only supports 
the first case, with the expectation that iterables will simply not 
define __len__ in the other two cases.

Perhaps there is a case for updating the len() concept to explicitly 
handle cases 2 and 3, instead of simply not defining __len__. Perhaps it 
could return -1 for unknown and -2 for infinite. Or raise some other 
exception apart from TypeError.

(I know there have been times I've wanted to know if an iterable was 
infinite, before spending the rest of my life iterating over it...)

And perhaps we can come up with a concept of total length, versus length 
of items remaining.

But these aren't simple issues with obvious solutions, it would surely 
need a PEP. And the benefit isn't obvious either.


-- 
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to