On Mon, Nov 26, 2018 at 01:29:21PM -0800, Kale Kundert wrote: > I just ran into the following behavior, and found it surprising: > > >>> len(map(float, [1,2,3])) > TypeError: object of type 'map' has no len() > > I understand that map() could be given an infinite sequence and therefore > might > not always have a length. But in this case, it seems like map() should've > known > that its length was 3.
This seems straightforward, but I think there's more complexity than you might realise, a nasty surprise which I expect is going to annoy people no matter what decision we make, and the usefulness is probably less than you might think. First, the usefulness: we still have to wrap the call to len() in a try...except block, even if we know we have a map object, because we won't know whether the underlying iterable supports len. So it won't reduce the amount of code we have to write. At best it will allow us to take a fast-path when len() returns a value, and a slow-path when it raises. Here's the definition of the Sized abc: https://docs.python.org/3/library/collections.abc.html#collections.abc.Sized and the implementation simply checks for the existence of __len__. We (rightly) assume that if __len__ exists, the object has a known length, and that calling len() on it will succeed or at least not raise TypeError. Your proposal will break that expectation. map objects will be sized, but since sometimes the underlying iterator won't be, they may still raise TypeError. Of course there are ways to work around this. We could just change our expectations: even Sized objects might not be *actually* sized. Or map() could catch the TypeError and raise instead a ValueError, or something. Or we could rethink the whole length concept (see below), which after all was invented back in Python 1 days and is looking a bit old. As for the nasty surprise... do you agree that this ought to be an invariant for sized iterables? count = len(it) i = 0 for obj in it: i += 1 assert i == count That's the invariant I expect, and breaking that will annoy me (and I expect many other people) greatly. But that means that map() cannot just delegate its length to the underlying iterable. The implementation must be more complex, keeping track of how many items it has seen. And consider this case: it = map(lambda x: x, [1, 2, 3, 4, 5]) x = next(it) x = next(it) assert len(it) == 5 # underlying length of the iterable assert len(list(it)) == 3 # but only three items left assert len(it) == 5 # still 5 assert len(list(it)) == 0 # but nothing left So the length of the iterable has to vary as you iterate over it, or you break the invariant shown above. But that's going to annoy other people for another reason: we rightly expect that iterables shouldn't change their length just because you iterate over them! The length should only change if you *modify* them. So these two snippets should do the same: # 1 n = len(it) x = sum(it) # 2 x = sum(it) n = len(it) but if map() updates its length as it goes, it will break that invariant. So *whichever* behaviour we choose, we're going to break *something*. Either the reported length isn't necessarily the same as the actual length you get from iterating over the items, which will be annoying and confusing, or it varies as you iterate, which will ALSO be annoying and confusing. Either way, this apparently simple and obvious change will be annoying and confusing. Rethinking object length ------------------------ len() was invented back in Python 1 days, or earlier, when we effectively had only one kind of iterable: sequences like lists, with a known length. Today, iterables can have: 1. a known, finite length; 2. a known infinite length; 3. An unknown length (and usually no way to estimate it). At least. The len() protocol is intentionally simple, it only supports the first case, with the expectation that iterables will simply not define __len__ in the other two cases. Perhaps there is a case for updating the len() concept to explicitly handle cases 2 and 3, instead of simply not defining __len__. Perhaps it could return -1 for unknown and -2 for infinite. Or raise some other exception apart from TypeError. (I know there have been times I've wanted to know if an iterable was infinite, before spending the rest of my life iterating over it...) And perhaps we can come up with a concept of total length, versus length of items remaining. But these aren't simple issues with obvious solutions, it would surely need a PEP. And the benefit isn't obvious either. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/