Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Terry Reedy

On 12/11/2018 6:50 PM, Greg Ewing wrote:


I'm not necessarily saying this *should* be done, just pointing
out that it's a possible strategy for migrating map() from
an iterator to a view, if we want to do that.


Python has list and list_iterator, tuple and tuple_iterator, set and 
set_iterator, dict and dict_iterator, range and range_iterator.


In 3.0, we could have turned map into a finite sequence analogous to 
range, and add a new map_iterator.  To be completely lazy, such a map 
would have to restrict input to Sequences.  To be compatible with 2.0 
map, it would have to use list(iterable) to turn other finite iterables 
into concrete lists, making it only semi-lazy. Since I am too lazy to 
write the multi-iterable version, here is the one-iterable version to 
show the idea.


def __init__(func, iterable):
self.func = func
self.seq = iterable if isinstance(iterable, Sequence) else 
list(iterable)


Given the apparent little need for the extra complication, and the 
possibility of keeping a reference to sequences and explicitly applying 
list otherwise, it was decided to rebind 'map' to the fully lazy and 
general itertools.map.


--
Terry Jan Reedy

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Steven D'Aprano
On Wed, Dec 12, 2018 at 11:31:03AM +1300, Greg Ewing wrote:
> Steven D'Aprano wrote:
> >I suggest we provide a separate mapview() type that offers only the lazy 
> >sequence API, without trying to be an iterator at the same time.
> 
> Then we would be back to the bad old days of having two functions
> that do almost exactly the same thing.

They aren't "almost exactly the same thing". One is a sequence, which is 
a rich API that includes random access to items and a length; the other 
is an iterator, which is an intentionally simple API which fails to meet 
the needs of some users.


> My suggestion was made in
> the interests of moving the language in the direction of having
> less warts, rather than adding more or moving the existing ones
> around.
> 
> I acknowledge that the dual interface is itself a bit wartish,

It's a "bit wartish" in the same way that the sun is "a bit warmish".


> but it's purely for backwards compatibility

And it fails at that too.

x = map(str.upper, "abcd")
x is iter(x)


returns True with the current map, an actual iterator, and False with 
your hybrid.

Current map() is a proper, non-broken iterator; your hybrid is a broken 
iterator. (That's not me being derogative: its the official term for 
iterators which don't stay exhausted.)

I'd be more charitable if I thought the flaws were mere bugs that could 
be fixed. But I don't think there is any way to combine two incompatible 
interfaces, the sequence and iterator APIs, into one object without 
these sorts of breakages.

Take the __next__ method out of your object, and it is a better version 
of what I proposed earlier. With the __next__ method, its just broken.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Greg Ewing

Steven D'Aprano wrote:

The iterator protocol is that iterators must:

- have a __next__ method;
- have an __iter__ method which returns self;

and the test for an iterator is:

obj is iter(obj)


By that test, it identifies as a sequence, as does testing it
for the presence of __len__:

>>> m is iter(m)
False
>>> hasattr(m, '__len__')
True

So, code that doesn't know whether it has a sequence or iterator
and tries to find out, will conclude that it has a sequence.
Presumably it will then proceed to treat it as a sequence, which
will work fine.


py> x = MapView(str.upper, "abcdef")  # An imposter.
py> next(x)
'A'
py> next(x)
'B'
py> next(iter(x))
'A'


That's a valid point, but it can be fixed:

def __iter__(self):
return self.iterator or map(self.func, *self.args)

Now it gives

>>> next(x)
'A'
>>> list(x)
[]

There is still one case that will behave differently from the
current map(), i.e. using list() first and then expecting it
to behave like an exhausted iterator. I'm finding it hard to
imagine real code that would depend on that behaviour, though.

> whether operations succeed or not depend on the

order that you call them:

py> x = MapView(str.upper, "abcdef")
py> len(x)*next(x)  # Safe. But only ONCE.


But what sane code is going to do that? Remember, the iterator
interface is only there for backwards compatibility. That would
fail under both Python 2 and the current Python 3.


py> def innocent_looking_function(obj):
... next(obj)
...
py> x = MapView(str.upper, "abcdef")
py> len(x)
6
py> innocent_looking_function(x)
py> len(x)
TypeError: Mapping iterator has no len()


If you're using len(), you clearly expect to have a sequence,
not an iterator, so why are you calling a function that blindly
expects an iterator? Again, this cannot be and could never have
been working code.

I presume this is just an oversight, but indexing continues to work even 
when len() has been broken.


That could be fixed.

This MapView class offers a hybrid "sequence plus iterator, together at 
last!" double-headed API, and even its creator says that sane code 
shouldn't use that API.


No. I would document it like this: It provides a sequence API.
It also, *for backwards compatibility*, implements some parts
of the iterator API, but new code should not rely on that,
nor should any code expect to be able to use both interfaces
on the same object.

The backwards compatibility would not be perfect, but I think
it would work in the vast majority of cases.

I also envisage that the backwards compatibility provisions
would not be kept forever, and that it would eventually become
a pure sequence object.

I'm not necessarily saying this *should* be done, just pointing
out that it's a possible strategy for migrating map() from
an iterator to a view, if we want to do that.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Chris Barker via Python-ideas
On Tue, Dec 11, 2018 at 11:10 AM Terry Reedy  wrote:

> > I _think_ someone may be advocating that map() could return an
> > iterable if it is passed a iterable,
>
> I believe you mean 'iterator' rather than 'iterable' here and below as a
> sequence is an iterable.
>

well, the iterator / iterable distinction is important in this thread in
many places, so I should have been more careful about that -- but not for
this reason. Yes, a a sequence is an iterable, but what I meant was an
"iterable-that-is-not-a-sequence".

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Greg Ewing

Steven D'Aprano wrote:
I suggest we provide a separate mapview() type that offers only the lazy 
sequence API, without trying to be an iterator at the same time.


Then we would be back to the bad old days of having two functions
that do almost exactly the same thing. My suggestion was made in
the interests of moving the language in the direction of having
less warts, rather than adding more or moving the existing ones
around.

I acknowledge that the dual interface is itself a bit wartish,
but it's purely for backwards compatibility, so it could be
deprecated and eventually removed if desired.

--
Greg

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Terry Reedy

On 12/11/2018 12:01 PM, Chris Barker - NOAA Federal via Python-ideas wrote:

Perhaps I got confused by the early part of this discussion.

My point was that there is no “map-like” object at the Python level.
(That is no Map abc).

Py2’s map produced a sequence. Py3’s map produced an iterable.

So any API that was expecting a sequence could accept the result of a
py2 map, but not a py3 map. There is absolutely nothing special about
map here.

The example of range has been brought up, but I don’t think it’s
analogous — py2 range returns a list, py3 range returns an immutable
sequence. Because that’s as close as we can get to a sequence while
preserving the lazy evaluation that is wanted.

I _think_ someone may be advocating that map() could return an
iterable if it is passed a iterable,


I believe you mean 'iterator' rather than 'iterable' here and below as a 
sequence is an iterable.



and a sequence of it is passed a sequence.
Yes, it could, but that seems like a bad idea to me.

But folks are proposing a “map” that would produce a lazy-evaluated
sequence. Sure — as Paul said, put it up on pypi and see if folks find
it useful.

Personally, I’m still finding it hard to imagine a use case where you
need the sequence features, but also lazy evaluation is important.

Sure: range() has that, but it came at almost zero cost, and I’m not
sure the sequence features are used much.

Note: the one use-case I can think of for a lazy evaluated sequence
instead of an iterable is so that I can pick a random element with
random.choice(). (Try to pick a random item from. a dict), but that
doesn’t apply here—pick a random item from the source sequence
instead.

But this is specific example of a general use case: you need to access
only a subset of the mapped sequence (or access it out of order) so
using the iterable version won’t work, and it may be large enough that
making a new sequence is too resource intensive.

Seems rare to me, and in many cases, you could do the subsetting
before applying the function, so I think it’s a pretty rare use case.

But go ahead and make it — I’ve been wrong before :-)



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Terry Reedy

On 12/11/2018 6:48 AM, E. Madison Bray wrote:



The idea would be to now enhance the existing built-ins to restore at
least some previously lost assumptions, at least in the relevant
cases.  To give an analogy, Python 3.0 replaced range() with
(effectively) xrange().  This broken a lot of assumptions that the
object returned by range(N) would work much like a list,


A range represents an arithmetic sequence.  Any usage of range that 
could be replaced by xrange, which is nearly all uses, made no 
assumption broken by xrange.  The basic assumption was and is that a 
range/xrange could be repeatedly iterated.  That this assumption was met 
in the first case by returning a list was somewhat of an implementation 
detail.  In terms of mutability, a tuple would be have been better, as 
range objects should not be mutable.  (If [2,4,6] is mutated to [2,3,7], 
it is no longer a range (arithmetic sequence).



and Python 3.2 restored some of that list-like functionality


As I see it, xranges were unfinished as sequence objects and 3.2 
finished the job.  This included having the min() and max() builtins 
calculate the min and max efficiently, as a human would, as the first or 
last of the sequence, rather than uselessly iterating and comparing all 
the items in the sequence.


A proper analogy to range would be a re-iterable mapview (or 'mapseq) 
like what Steven D'Aprano proposes.



** I have a separate complaint that there's no great way, at the
Python level, to define a class that is explicitly a "sequence" as
opposed to a more general "mapping",

You mean like this?

>>> from collections.abc import Sequence as S
>>> isinstance((), S)
True
>>> isinstance([], S)
True
>>> isinstance(range(5), S)
True
>>> isinstance({}, S)
False
>>> isinstance(set(), S)
False
>>> class NItems(S):
def __init__(self, n, item):
self.len = n
self.item = item
def __getitem__(self, i):   # missing index check
return self.item
def __len__(self):
>>> isinstance(NItems(2, 3), S)
True

--
Terry Jan Reedy

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] __len__() for map()

2018-12-11 Thread Terry Reedy

On 12/1/2018 2:08 PM, Steven D'Aprano wrote:


This proof of concept wrapper class could have been written any time
since Python 1.5 or earlier:




class lazymap:
 def __init__(self, function, sequence):


One could now add at the top of the file
   from collections.abc import Sequence

and here

if not isinstance(sequence, Sequence):
raise TypeError(f'{sequence} is not a sequence')


 self.function = function
 self.wrapped = sequence
 def __len__(self):
 return len(self.wrapped)
 def __getitem__(self, item):
 return self.function(self.wrapped[item])


For 3.x, I would add

def __iter__: return map(self.function, self.sequence)

but your point that iteration is possible even without, with the old 
protocol, is well made.



It is fully iterable using the sequence protocol, even in Python 3:

py> x = lazymap(str.upper, 'aardvark')
py> list(x)
['A', 'A', 'R', 'D', 'V', 'A', 'R', 'K']


Mapped items are computed on demand, not up front. It doesn't make a
copy of the underlying sequence, it can be iterated over and over again,
it has a length and random access. And if you want an iterator, you can
just pass it to the iter() function.

There are probably bells and whistles that can be added (a nicer repr?
any other sequence methods? a cache?) and I haven't tested it fully.

For backwards compatibilty reasons, we can't just make map() work like
this, because that's a change in behaviour. There may be tricky corner
cases I haven't considered, but as a proof of concept I think it shows
that the basic premise is sound and worth pursuing.



--
Terry Jan Reedy

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Terry Reedy

On 12/1/2018 8:07 PM, Greg Ewing wrote:

Steven D'Aprano wrote:


After defining a separate iterable mapview sequence class

For backwards compatibilty reasons, we can't just make map() work like 
this, because that's a change in behaviour.


Actually, I think it's possible to get the best of both worlds.


I presume you mean the '(iterable) sequence' 'iterator' worlds.  I don't 
think they should be mixed.  A sequence is reiterable, an iterator is 
once through and done.



Consider this:

from operator import itemgetter

class MapView:

     def __init__(self, func, *args):
     self.func = func
     self.args = args
     self.iterator = None

     def __len__(self):
     return min(map(len, self.args))

     def __getitem__(self, i):
     return self.func(*list(map(itemgetter(i), self.args)))

     def __iter__(self):
     return self

     def __next__(self):
     if not self.iterator:
     self.iterator = map(self.func, *self.args)
     return next(self.iterator)


The last two (unnecessarily) restrict this to being a once through 
iterator.  I think much better would be


def __iter__: return map(self.func, *self.args)

--
Terry Jan Reedy


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Chris Barker - NOAA Federal via Python-ideas
Perhaps I got confused by the early part of this discussion.

My point was that there is no “map-like” object at the Python level.
(That is no Map abc).

Py2’s map produced a sequence. Py3’s map produced an iterable.

So any API that was expecting a sequence could accept the result of a
py2 map, but not a py3 map. There is absolutely nothing special about
map here.

The example of range has been brought up, but I don’t think it’s
analogous — py2 range returns a list, py3 range returns an immutable
sequence. Because that’s as close as we can get to a sequence while
preserving the lazy evaluation that is wanted.

I _think_ someone may be advocating that map() could return an
iterable if it is passed a iterable, and a sequence of it is passed a
sequence. Yes, it could, but that seems like a bad idea to me.

But folks are proposing a “map” that would produce a lazy-evaluated
sequence. Sure — as Paul said, put it up on pypi and see if folks find
it useful.

Personally, I’m still finding it hard to imagine a use case where you
need the sequence features, but also lazy evaluation is important.

Sure: range() has that, but it came at almost zero cost, and I’m not
sure the sequence features are used much.

Note: the one use-case I can think of for a lazy evaluated sequence
instead of an iterable is so that I can pick a random element with
random.choice(). (Try to pick a random item from. a dict), but that
doesn’t apply here—pick a random item from the source sequence
instead.

But this is specific example of a general use case: you need to access
only a subset of the mapped sequence (or access it out of order) so
using the iterable version won’t work, and it may be large enough that
making a new sequence is too resource intensive.

Seems rare to me, and in many cases, you could do the subsetting
before applying the function, so I think it’s a pretty rare use case.

But go ahead and make it — I’ve been wrong before :-)

-CHB




Sent from my iPhone

> On Dec 11, 2018, at 6:47 AM, Steven D'Aprano  wrote:
>
>> On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas 
>> wrote:
>> [...]
>> I'm still confused -- what's so wrong with:
>>
>> list(map(func, some_iterable))
>>
>> if you need a sequence?
>
> You might need a sequence. Why do you think that has to be an *eager*
> sequence?
>
> I can think of two obvious problems with eager sequences: space and
> time. They can use too much memory, and they can take too much time to
> generate them up-front and too much time to reap when they become
> garbage. And if you have an eager sequence, and all you want is the
> first item, you still have to generate all of them even though they
> aren't needed.
>
> We can afford to be profligate with memory when the data is small, but
> eventually you run into cases where having two copies of the data is one
> copy too many.
>
>
>> You can, of course mike lazy-evaluated sequences (like range), and so you
>> could make a map-like function that required a sequence as input, and would
>> lazy evaluate that sequence. This could be useful if you weren't going to
>> work with the entire collection,
>
> Or even if you *are* going to work with the entire collection, but you
> don't need them all at once. I once knew a guy whose fondest dream was
> to try the native cuisine of every nation of the world ... but not all
> in one meal.
>
> This is a classic time/space tradeoff: for the cost of calling the
> mapping function anew each time we index the sequence, we can avoid
> allocating a potentially huge list and calling a potentially expensive
> function up front for items we're never going to use. Instead, we call
> it only on demand.
>
> These are the same principles that justify (x)range and dict views. Why
> eagerly generate a list up front, if you only need the values one at a
> time on demand? Why make a copy of the dict keys, if you don't need a
> copy? These are not rhetorical questions.
>
> This is about avoiding the need to make unnecessary copies for those
> times we *don't* need an eager sequence generated up front, keeping the
> laziness of iterators and the random-access of sequences.
>
> map(func, sequence) is a great candidate for this approach. It has to
> hold onto a reference to the sequence even as an iterator. The function
> is typically side-effect free (a pure function), and if it isn't,
> "consenting adults" applies. We've already been told there's at least
> one major Python project, Sage, where this would have been useful.
>
> There's a major functional language, Haskell, where nearly all sequence
> processing follows this approach.
>
> I suggest we provide a separate mapview() type that offers only the lazy
> sequence API, without trying to be an iterator at the same time. If you
> want an eager sequence, or an iterator, they're only a single function
> call away:
>
>list(mapview_instance)
>iter(mapview_instance)  # or just stick to map()
>
> Rather than trying to guess whether people want to tr

Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Steven D'Aprano
On Tue, Dec 11, 2018 at 12:48:10PM +0100, E. Madison Bray wrote:

> Right now I'm specifically responding to the sub-thread that Greg
> started "Suggested MapView object", so I'm considering this a mostly
> clean slate from the previous thread "__len__() for map()".  Different
> ideas have been tossed around and the discussion has me thinking about
> broader possibilities.  I responded to this thread because I liked
> Greg's proposal and the direction he's suggesting.

Greg's code can be found here:

https://mail.python.org/pipermail/python-ideas/2018-December/054659.html

His MapView tries to be both an iterator and a sequence at the same 
time, but it is neither.

The iterator protocol is that iterators must:

- have a __next__ method;
- have an __iter__ method which returns self;

and the test for an iterator is:

obj is iter(obj)

https://docs.python.org/3/library/stdtypes.html#iterator-types

Greg's MapView object is an *iterable* with a __next__ method, which 
makes it neither a sequence nor a iterator, but a hybrid that will 
surprise people who expect it to act considently as either.


This is how iterators work:

py> x = iter("abcdef")  # An actual iterator.
py> next(x)
'a'
py> next(x)
'b'
py> next(iter(x))
'c'

Greg's hybrid violates that expected behaviour:

py> x = MapView(str.upper, "abcdef")  # An imposter.
py> next(x)
'A'
py> next(x)
'B'
py> next(iter(x))
'A'



As an iterator, it is officially "broken", continuing to yield values 
even after it is exhausted:

py> x = MapView(str.upper, 'a')
py> next(x)
'A'
py> next(x)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/steve/gregmapview.py", line 24, in __next__
return next(self.iterator)
StopIteration
py> list(x)  # But wait! There's more!
['A']
py> list(x)  # And even more!
['A']



This hybrid is fragile: whether operations succeed or not depend on the 
order that you call them:

py> x = MapView(str.upper, "abcdef")
py> len(x)*next(x)  # Safe. But only ONCE.
'AA'

py> y = MapView(str.upper, "uvwxyz")
py> next(y)*len(y)  # Looks safe. But isn't.
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/steve/gregmapview.py", line 12, in __len__
raise TypeError("Mapping iterator has no len()")
TypeError: Mapping iterator has no len()

(For brevity, from this point on I shall trim the tracebacks and show 
only the final error message.)



Things that work once, don't work a second time.

py> len(x)*next(x)  # Worked a moment ago, but now it is broken.
TypeError: Mapping iterator has no len()



If you pass your MapView object to another function, it can 
accidentally sabotage your code:

py> def innocent_looking_function(obj):
... next(obj)
...
py> x = MapView(str.upper, "abcdef")
py> len(x)
6
py> innocent_looking_function(x)
py> len(x)
TypeError: Mapping iterator has no len()



I presume this is just an oversight, but indexing continues to work even 
when len() has been broken.


Greg seems to want to blame the unwitting coder who runs into these 
boobytraps:

"But there are no surprises as long as you
stick to one interface or the other. Weird things happen
if you mix them up, but sane code won't be doing that."

(URL as above).

This MapView class offers a hybrid "sequence plus iterator, together at 
last!" double-headed API, and even its creator says that sane code 
shouldn't use that API. 

Unfortunately, you can't use the iterator API, because its broken as an 
iterator, and you can't use it as a sequence, because any function you 
pass it to might use it as an iterator and pull the rug out from under 
your feet.

Greg's code is, apart from the addition of the __next__ method, almost 
identical to the version of mapview I came up with in my own testing. 
Except Greg's is even better, since I didn't bother handling the 
multiple-sequences case and his does.

Its the __next__ method which ruins it, by trying to graft on almost- 
but-not-really iterator behaviour onto something which otherwise is a 
sequence. I don't think there's any way around that: I think that any 
attempt to make a single MapView object work as either a sequence with a 
length and indexing AND an iterator with next() and no length and no 
indexing is doomed to the same problems. Far from minimizing surprise, 
it will maximise it.

Look at how many violations of the Principle Of Least Surprise Greg's 
MapView has:

- If an object has a __len__ method, calling len() on it shouldn't 
  raise TypeError;

- If you called len() before, and it succeeded, calling it again
  should also succeed;

- if an object has a __next__ method, it should be an iterator, 
  and that means iter(obj) is obj;

- if it isn't an iterator, you shouldn't be able to call next() on it;

- if it is an iterator, once it is exhausted, it should stay exhausted;

- iterating over an object (calling next() or iter() on it) shouldn't
  change it from a sequence to a non-sequence;

- passing a sequence to another function, shouldn't resu

Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Steven D'Aprano
On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas wrote:
[...]
> I'm still confused -- what's so wrong with:
> 
> list(map(func, some_iterable))
> 
> if you need a sequence?

You might need a sequence. Why do you think that has to be an *eager* 
sequence?

I can think of two obvious problems with eager sequences: space and 
time. They can use too much memory, and they can take too much time to 
generate them up-front and too much time to reap when they become 
garbage. And if you have an eager sequence, and all you want is the 
first item, you still have to generate all of them even though they 
aren't needed.

We can afford to be profligate with memory when the data is small, but 
eventually you run into cases where having two copies of the data is one 
copy too many.


> You can, of course mike lazy-evaluated sequences (like range), and so you
> could make a map-like function that required a sequence as input, and would
> lazy evaluate that sequence. This could be useful if you weren't going to
> work with the entire collection, 

Or even if you *are* going to work with the entire collection, but you 
don't need them all at once. I once knew a guy whose fondest dream was 
to try the native cuisine of every nation of the world ... but not all 
in one meal.

This is a classic time/space tradeoff: for the cost of calling the 
mapping function anew each time we index the sequence, we can avoid 
allocating a potentially huge list and calling a potentially expensive 
function up front for items we're never going to use. Instead, we call 
it only on demand.

These are the same principles that justify (x)range and dict views. Why 
eagerly generate a list up front, if you only need the values one at a 
time on demand? Why make a copy of the dict keys, if you don't need a 
copy? These are not rhetorical questions.

This is about avoiding the need to make unnecessary copies for those 
times we *don't* need an eager sequence generated up front, keeping the 
laziness of iterators and the random-access of sequences.

map(func, sequence) is a great candidate for this approach. It has to 
hold onto a reference to the sequence even as an iterator. The function 
is typically side-effect free (a pure function), and if it isn't, 
"consenting adults" applies. We've already been told there's at least 
one major Python project, Sage, where this would have been useful.

There's a major functional language, Haskell, where nearly all sequence 
processing follows this approach.

I suggest we provide a separate mapview() type that offers only the lazy 
sequence API, without trying to be an iterator at the same time. If you 
want an eager sequence, or an iterator, they're only a single function 
call away:

list(mapview_instance)
iter(mapview_instance)  # or just stick to map()

Rather than trying to guess whether people want to treat their map 
objects as sequences or iterators, we let them choose which they want 
and be explicit about it.

Consider the history of dict.keys(), values() and items() in Python 2. 
Originally they returned eager lists. Did we try to retrofit view-like 
and iterator-like behaviour onto the existing dict.keys() method, 
returning a cunning object which somehow turned from a list to a view to 
an iterator as needed? Hell no! We introduced *six new methods* on 
dicts:

- dict.iterkeys()
- dict.viewkeys()

and similar for items() and values().

Compared to that, adding a single variant on map() that expects a 
sequence and returns a view on the sequence seems rather timid.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Paul Moore
On Tue, 11 Dec 2018 at 11:49, E. Madison Bray  wrote:
> The idea would be to now enhance the existing built-ins to restore at
> least some previously lost assumptions, at least in the relevant
> cases.  To give an analogy, Python 3.0 replaced range() with
> (effectively) xrange().  This broken a lot of assumptions that the
> object returned by range(N) would work much like a list, and Python
> 3.2 restored some of that list-like functionality by adding support
> for slicing and negative indexing on range(N).  I believe it's worth
> considering such enhancements for filter() and map() as well, though
> these are obviously a bit trickier.

Thanks. That clarifies the situation for me very well.

I agree with most of the comments you made, although I don't have any
good answers. I think you're probably right that Guido's original idea
to move map and filter to functools might have been better, forcing
users to explicitly choose between a genexp and a list comprehension.
On the other hand, it might have meant people used more lists than
they needed to, as a result.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread E. Madison Bray
On Tue, Dec 11, 2018 at 12:13 PM Paul Moore  wrote:
>
> On Tue, 11 Dec 2018 at 10:38, E. Madison Bray  wrote:
> > I don't understand why this is confusing.
> [...]
> > For something like a fixed sequence a "map" could just as easily be
> > defined as a pair (, ) that applies ,
> > which I'm claiming is a pure function, to every element returned by
> > the .  This transformation can be applied lazily on a
> > per-element basis whether I'm iterating over it, or performing random
> > access (since  is known for all N).
>
> What's confusing to *me*, at least, is what's actually being suggested
> here. There's a lot of theoretical discussion, but I've lost track of
> how it's grounded in reality:

It's true, this has been a wide-ranging discussion and it's confusing.
Right now I'm specifically responding to the sub-thread that Greg
started "Suggested MapView object", so I'm considering this a mostly
clean slate from the previous thread "__len__() for map()".  Different
ideas have been tossed around and the discussion has me thinking about
broader possibilities.  I responded to this thread because I liked
Greg's proposal and the direction he's suggesting.

I think that the motivation underlying much of this discussion, forth
both the OP who started the original thread, as well as myself, and
others is that before Python 3 changed the implementation of map()
there were certain assumptions one could make about map() called on a
list* which, under normal circumstances were quite reasonable and sane
(e.g. len(map(func, lst)) == len(lst), or map(func, lst)[N] ==
func(lst[N])).

Python 3 broke all of these assumptions, for reasons that I personally
have no disagreement with, in terms of motivation.

However, in retrospect, it might have been nice if more consideration
were given to backwards compatibility for some "obvious" simple cases.
This isn't a Python 2 vs Python 3 whine though: I'm just trying to
think about how I might expect map() to work on different types of
arguments, and I see no problem--so long as it's properly
documented--with making its behavior somewhat polymorphic on the types
of arguments.

The idea would be to now enhance the existing built-ins to restore at
least some previously lost assumptions, at least in the relevant
cases.  To give an analogy, Python 3.0 replaced range() with
(effectively) xrange().  This broken a lot of assumptions that the
object returned by range(N) would work much like a list, and Python
3.2 restored some of that list-like functionality by adding support
for slicing and negative indexing on range(N).  I believe it's worth
considering such enhancements for filter() and map() as well, though
these are obviously a bit trickier.

* or other fixed-length sequence, but let's just use list as a
shorthand, and assume for the sake of simplicity a single list as
well.

> 1. If we're saying that "it would be nice if there were a function
> that acted like map but kept references to its arguments", that's easy
> to do as a module on PyPI. Go for it - no-one will have any problem
> with that.

Sure, though since this is about the behavior of global built-ins that
are commonly used by users at all experience levels the problem is a
bit hairier.  Anybody can implement anything they want and put it in a
third-party module. That doesn't mean anyone will use it.  I still
have to write code that handles map objects.

In retrospect I think Guido might have had the right idea of wanting
to move map() and filter() into functools along with reduce().
There's a surprisingly lot more at stake in terms of backwards
compatibility and least-astonishment when it comes to built-ins.  I
think that's in part why the new Python 3 definitions of map() and
filter() were kept so simple: although they were not backwards
compatible I do think they were well designed to minimize
astonishment.  That's why I don't necessarily disagree with the
choices made (but still would like to think about how we can make
enhancements going forward).

> 2. If we're saying "the builtin map needs to behave like that", then
>   2a. *Why*? What is so special about this situation that the builtin
> has to be changed?

Same question could apply to last time it was changed.  I think now
we're trying to find some middle-ground.

>   2b. Compatibility questions need to be addressed. Is this important
> enough to code that "needs" it that such code is OK with being Python
> 3.8+ only? If not, why aren't the workarounds needed for Python 3.7
> good enough? (Long term improvement and simplification of the code
> *is* a sufficient reason here, it's just something that should be
> explicit, as it means that the benefits are long-term rather than
> immediate).

That's a good point: I think the same arguments as for enhancing
range() apply here, but this is worth further consideration (though
having a more concrete proposal in the first place should come first).

>   2c. Weird corner case questions, while still being rare, *do* need
> to be

Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread Paul Moore
On Tue, 11 Dec 2018 at 10:38, E. Madison Bray  wrote:
> I don't understand why this is confusing.
[...]
> For something like a fixed sequence a "map" could just as easily be
> defined as a pair (, ) that applies ,
> which I'm claiming is a pure function, to every element returned by
> the .  This transformation can be applied lazily on a
> per-element basis whether I'm iterating over it, or performing random
> access (since  is known for all N).

What's confusing to *me*, at least, is what's actually being suggested
here. There's a lot of theoretical discussion, but I've lost track of
how it's grounded in reality:

1. If we're saying that "it would be nice if there were a function
that acted like map but kept references to its arguments", that's easy
to do as a module on PyPI. Go for it - no-one will have any problem
with that.
2. If we're saying "the builtin map needs to behave like that", then
  2a. *Why*? What is so special about this situation that the builtin
has to be changed?
  2b. Compatibility questions need to be addressed. Is this important
enough to code that "needs" it that such code is OK with being Python
3.8+ only? If not, why aren't the workarounds needed for Python 3.7
good enough? (Long term improvement and simplification of the code
*is* a sufficient reason here, it's just something that should be
explicit, as it means that the benefits are long-term rather than
immediate).
  2c. Weird corner case questions, while still being rare, *do* need
to be addressed - once a certain behaviour is in the stdlib, changing
it is a major pain, so we have a responsibility to get even the corner
cases right.
  2d. It's not actually clear to me how critical that need actually
is. Nice to have, sure (you only need a couple of people who would use
a feature for it to be "nice to have") but beyond that I haven't seen
a huge number of people offering examples of code that would benefit
(you mentioned Sage, but that example rapidly degenerated into debates
about Sage's design, and while that's a very good reason for not
wanting to continue using that as a use case, it does leave us with
few actual use cases, and none that I'm aware of that are in
production code...)
3. If we're saying something else (your comment "map could just as
easily be defined as..." suggests that you might be) then I'm not
clear what it is. Can you describe your proposal as pseudo-code, or a
Python implementation of the "map" replacement you're proposing?

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggested MapView object (Re: __len__() for map())

2018-12-11 Thread E. Madison Bray
On Tue, Dec 11, 2018 at 2:16 AM Chris Barker  wrote:
> On Mon, Dec 10, 2018 at 5:23 AM E. Madison Bray  wrote:
>>
>> Indeed; I believe it is very useful to have a map-like object that is
>> effectively an augmented list/sequence.
>
>
> but what IS a "map-like object" -- I'm trying to imagine what that actually 
> means.
>
> "map" takes a function and maps it onto a interable, returning a new 
> iterable. So a map object is an iterable -- what's under the hood being used 
> to create it is (and should remain) opaque.

I don't understand why this is confusing.  Greg gave an example of
what this *might* mean up thread.  It's not the only possible approach
but it is one that makes a lot of sense to me.  The way you're
defining "map" is arbitrary and post-hoc.  It's a definition that
makes sense for "map" that's restricted to iterating over arbitrary
iterators.  It's how it happens to be defined in Python 3 for various
reasons that you took time to explain at great length, which I regret
to inform you was time wasted explaining things I already know.

For something like a fixed sequence a "map" could just as easily be
defined as a pair (, ) that applies ,
which I'm claiming is a pure function, to every element returned by
the .  This transformation can be applied lazily on a
per-element basis whether I'm iterating over it, or performing random
access (since  is known for all N).

Python has no formal notion of a pure function, but I'm an adult and
can accept responsibility if I try to use this "map-like" object in a
way that is not logically consistent.

The stuff about Sage is beside the point.  I'm not even talking about
that anymore.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/