subject:"\[Python\-ideas\] Re\: zip\(x, y, z, strict=True\)"

[Python-ideas] Re: zip(x, y, z, strict=True)

2022-12-01 Thread Ethan Furman


On 12/1/22 11:36, Ram Rachum wrote:

> Reviving this thread 2.5 years after I started it just to share this 
satisfying moment...

Very cool!  Always enjoyable to benefit from the fruits of one labors.  :-)

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SUNW2FQIPB6IOKIFEULCHPPZL3WUHLVC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2022-12-01 Thread Eric Fahlgren

I liked your idea so much back when you suggested it that I implemented it
in our sitecustomize.py back when we were using 3.8, and added a check to
our lint tools to require that the 'strict=' parameter be present in all
uses of 'zip'.  We found a couple of silent bugs almost immediately, so
thank you!

On Thu, Dec 1, 2022 at 11:38 AM Ram Rachum  wrote:

> Reviving this thread 2.5 years after I started it just to share this
> satisfying moment. I was just spending a few hours furiously coding on my
> research  using Python 3.10, and when I ran my
> code I got this traceback:
>
>   File
> "/home/ramrachum/.venvs/ray_env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py",
> line 1993, in log_result
> self.callbacks.on_train_result(algorithm=self, result=result)
> ││└
> {'custom_metrics': {}, 'episode_media': {}, 'num_recreated_workers': 0,
> 'info': {'learner': {'policy_robot': {'learner_stats': {...
> │└ PPO
> └ PPO
>   File "/herring/nichos/nichos/snare/__init__.py", line 150, in
> on_train_result
> robot_move_by_state = get_move_by_state(
>   File "/herring/nichos/nichos/snare/__init__.py", line 75, in
> get_move_by_state
> return dict(zip(states, moves, strict=True))
> │   └ array([0, 1, 0, ..., 1, 0, 1])
> └ 
> ValueError: zip() argument 2 is longer than argument 1
>
> It took a few seconds for me to understand... "Ah... This error is my
> error" :)
>
>
> On Mon, Apr 20, 2020 at 8:42 PM Ram Rachum  wrote:
>
>> Here's something that would have saved me some debugging yesterday:
>>
>> >>> zipped = zip(x, y, z, strict=True)
>>
>> I suggest that `strict=True` would ensure that all the iterables have
>> been exhausted, raising an exception otherwise.
>>
>> This is useful in cases where you're assuming that the iterables all have
>> the same lengths. When your assumption is wrong, you currently just get a
>> shorter result, and it could take you a while to figure out why it's
>> happening.
>>
>> What do you think?
>>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/DPQEIIOAEGJKIVDHTAKQ6CJPCGS7C5G7/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZI5CDH3M66MPHNQK3UJVWAF77VG7JVJC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2022-12-01 Thread Ram Rachum

Reviving this thread 2.5 years after I started it just to share this
satisfying moment. I was just spending a few hours furiously coding on my
research  using Python 3.10, and when I ran my code
I got this traceback:

  File
"/home/ramrachum/.venvs/ray_env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py",
line 1993, in log_result
self.callbacks.on_train_result(algorithm=self, result=result)
││└
{'custom_metrics': {}, 'episode_media': {}, 'num_recreated_workers': 0,
'info': {'learner': {'policy_robot': {'learner_stats': {...
│└ PPO
└ PPO
  File "/herring/nichos/nichos/snare/__init__.py", line 150, in
on_train_result
robot_move_by_state = get_move_by_state(
  File "/herring/nichos/nichos/snare/__init__.py", line 75, in
get_move_by_state
return dict(zip(states, moves, strict=True))
│   └ array([0, 1, 0, ..., 1, 0, 1])
└ 
ValueError: zip() argument 2 is longer than argument 1

It took a few seconds for me to understand... "Ah... This error is my
error" :)


On Mon, Apr 20, 2020 at 8:42 PM Ram Rachum  wrote:

> Here's something that would have saved me some debugging yesterday:
>
> >>> zipped = zip(x, y, z, strict=True)
>
> I suggest that `strict=True` would ensure that all the iterables have been
> exhausted, raising an exception otherwise.
>
> This is useful in cases where you're assuming that the iterables all have
> the same lengths. When your assumption is wrong, you currently just get a
> shorter result, and it could take you a while to figure out why it's
> happening.
>
> What do you think?
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DPQEIIOAEGJKIVDHTAKQ6CJPCGS7C5G7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-09 Thread Andrew Barnert via Python-ideas

> On May 9, 2020, at 04:30, Alex Hall  wrote:
> 
>> On Fri, May 8, 2020 at 11:22 PM Andrew Barnert via Python-ideas 
>>  wrote:
> 
>> Trying to make it a flag (which will always be passed a constant value) is a 
>> clever way to try to get the best of both worlds—and so is the 
>> chain.from_iterable style.
> 
> At this point it sounds like you're saying that zip(..., strict=True) and 
> zip.strict(...) are equally bad.

You’re right, it did sound like that, and I don’t mean that. Sorry.

zip.strict has _some_ of the same problems as zip(strict=True), but definitely 
not _all_ of them. And I definitely prefer zip.strict to the flag.

At the time I wrote this (I don’t know why it took a few days to get 
delivered…), zip.strict had come up the first time and been roundly shouted 
down, and it seemed like.nobody but me (and the proposer, of course) had found 
it at all acceptable, and I was trying to make the point that if people don’t 
like zip.strict, the same things and more apply to passing an always-constant 
flag, so it should be even more acceptable.

Then. over the last few days, a bunch of people came around on zip.strict. And 
that seems to be at least in part because people came up with better arguments 
than the first time around. (For example, I forget who it was that pointed out 
that you don’t really have to start thinking of zip as a class and zip.strict 
as an alternate constructor, because plenty of people don’t realize that’s true 
for chain.from_iterable and they still have no more problem using it than they 
do for datetime.now.)

So now, rather than it being a +0 for me and a distant second choice behind an 
itertools function, I think I’m pretty close to evenly torn between the two.

I do think that if we add zip.strict, we should also probably add zip.longest, 
not just think about maybe adding it some day. And it might even be worth 
adding zip.shortest, even if we have no intention of ever eliminating zip() 
itself or changing it to mean zip.strict. But I don’t have good arguments for 
these; I’ll have to think about it a bit more to explain why I think 
consistency easily trumps the costs for this variant of the proposal but 
probably fails for other variants.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RAFDWYYUIDOLCQ4M7HS35DZL56LR32YX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-09 Thread Alex Hall

On Fri, May 8, 2020 at 11:22 PM Andrew Barnert via Python-ideas <
python-ideas@python.org> wrote:

> Trying to make it a flag (which will always be passed a constant value) is
> a clever way to try to get the best of both worlds—and so is the
> chain.from_iterable style.

At this point it sounds like you're saying that zip(..., strict=True) and
zip.strict(...) are equally bad.

> But if either of those really did get the best of both worlds and the
> problems of neither, it would be used all over the place, rather than as
> sparingly as possible. And of course it doesn’t get the best of both
> worlds. A flag is hiding code as data, and it looks misleadingly like the
> much more common uses of flags where you actually do often set the flag
> with a runtime value. It’s harder to type (and autocomplete makes the
> difference worse, not better). It’s a tiny bit harder to read, because
> you’re adding as much meaningless boilerplate (True) as important
> information (strict).

But all of this just applies to a flag, not to zip.strict(...).

> It’s increasing the amount of stuff to learn in builtins just as much as
> another function would. And so on.

This applies to both, but it's not true. Both zip.strict() and
zip(strict=True) are at least somewhat more hidden and encapsulated than a
top level builtin zip_strict().

I also think it's worth questioning something that is being taken for
granted. What exactly is the cost of adding a builtin? It's not entirely
obvious, at least not to me. Clarifying the precise disadvantages would let
us see how well they apply here.

You mention increasing the amount of stuff to learn, but I'm guessing 80%
of Python coders don't know all the functions in builtins, and that doesn't
really hurt them. I wouldn't recommend anyone reading through all of
https://docs.python.org/3/library/functions.html
 just for the sake of
learning it all, do we want to support people doing that? People should
just google the builtins they're not familiar with when they come across
them. I can see several builtins that I've never or almost never used.

When people read the docs for zip, it points them to zip_longest. If
zip_strict was in itertools, it would point to that too. Is that
significantly better than pointing to zip.strict or even a builtin
zip_strict a little further down the same page? I think if someone is
reading the docs for zip, it's worth their time to learn all its flavours,
and where exactly they read about those doesn't matter.

The only time I'm ever annoyed by something being a builtin is that I avoid
using it as a variable name. This often happens with id, type, all, list,
etc. That wouldn't even be a significant argument against a builtin
zip_strict, it doesn't apply at all to zip.strict.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5UEUM4FUYL33I7VIFR3J44LUMOO7WAH3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-08 Thread Andrew Barnert via Python-ideas

On May 5, 2020, at 12:50, Christopher Barker  wrote:
> 
> Another key point is that if you want zip_longest() functionality, you simply 
> can not get it with the builtin zip() -- you are forced to look elsewhere. 
> Whereas most code that might want "strict" behavior will still work, albeit 
> less safely, with the builtin.

I think this is a key point, but I think you’ve got it backward.

You _can_ build zip_longest with zip, and before 2.6, people _did_. (Well, they 
built izip_longest with izip.) I’ve still got my version in an old toolbox. You 
chain a repeat(None) onto each iterable, izip, and you get an infinite iterator 
that you have to read until all(is None). You can just takewhile that into 
exactly the same thing as izip_longest, but unfortunately that’s a lot slower 
than filtering when you iterate, so I had both _longest and _infinite variants, 
and I think I used the latter more even though it was usually less convenient. 
That sounds like a silly way to do it, and it’s certainly easier to get subtly 
wrong than just writing a generator function like the “as if” code in the 
(i)zip_longest docs, but a comment in my code assures me that this is almost 4x 
as fast, and half the speed of a custom C implementation, so I’m pretty sure 
that’s why I did it. And I doubt I’m the only person who solved it that way. In 
fact, I’ll bet I copied it from an ActiveState recipe or a colleague or an open 
source project.

So, most likely, izip_longest wasn’t added because you can’t build it on top of 
izip, but because building it on top of izip is easy to get subtly wrong 
(especially if you need it to be fast—or don’t need it to be fast but micro 
optimize it anyway, for that matter), and often people punt and do something 
clunkier (use _infinite instead of _longest and make the final for loop more 
complicated).

Which is actually a pretty good parallel for the current proposal. You can 
write your own zip_strict on top of zip, and at least a few people do—but, as 
people have shown in this thread, the obvious solution is too slow, the obvious 
fast solution is very easy to get subtly wrong, and often people punt and do 
something clunkier (listify and compare len).

That’s why I’m +1 on this proposal in some form. Assuming zip_strict would be 
useful at least as often as zip_longest (and I’ve been sold on that part, and I 
think most people on all sides of this discussion agree?), it calls out for a 
good official solution. The fact that the ecosystem is different nowadays (pip 
install more-itertools or copying off StackOverflow is a lot simpler, and more 
common, than finding a recipe on ActiveState) does make it a little less 
compelling, but at most that means the official solution should be a docs link 
to more-itertools, still not that we should do nothing.

But that’s also part of the reason I’m -1 on it being a flag. Just like 
zip_longest, it’s a different function, one you shouldn’t think of as being 
built on zip even if it could be. Maybe strict really is needed so much more 
often than longest that “import itertools” is too onerous, but if that’s really 
true, that different function should be another builtin. I think nobody is 
arguing for that, because it’s just obvious that it isn’t needed enough to 
reach the high bar of adding another function to builtins. But that means it 
belongs in itertools.

Trying to make it a flag (which will always be passed a constant value) is a 
clever way to try to get the best of both worlds—and so is the 
chain.from_iterable style. But if either of those really did get the best of 
both worlds and the problems of neither, it would be used all over the place, 
rather than as sparingly as possible. And of course it doesn’t get the best of 
both worlds. A flag is hiding code as data, and it looks misleadingly like the 
much more common uses of flags where you actually do often set the flag with a 
runtime value. It’s harder to type (and autocomplete makes the difference 
worse, not better). It’s a tiny bit harder to read, because you’re adding as 
much meaningless boilerplate (True) as important information (strict). It’s 
increasing the amount of stuff to learn in builtins just as much as another 
function would. And so on. So it’s only worth doing for really special cases, 
like open.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IEMCC3WXEHV2J7DLP7OXWSYATLSC3BBI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-07 Thread Steven D'Aprano

On Wed, May 06, 2020 at 08:48:51AM -0700, Christopher Barker wrote:

[I asked]
> > A ternary flag of strict = True, False or what?
> >
> 
> Come on:
> 
> ternary: having three elements, parts, or divisions (
> https://www.merriam-webster.com/dictionary/ternary)
> 
> did you really not know that? 

Of course I know what ternary is.

> (and "flag" does not always mean "boolean
> flag", even thoughit often does (https://techterms.com/definition/flag) )

I am arguing against the proposal being discussed in this part of the 
thread, namely to add a **boolean** flag "strict=True|False".

Then if we want to extend the API in the future, you say "Oh well that's 
easy, let's just turn it into a ternary flag" (paraphrasing, not a 
direct quote). Okay, so what will the third flag be? Standard ternary 
flags are:

True, False, Maybe
True, False, Unknown

Neither Maybe nor Unknown are Python builtins. Should they be? I doubt 
it. So what do we use? Whenever I've needed ternary logic, I've 
used None. But in this case, that doesn't work:

zip(*iters, strict=None)

What a very non-self-explanatory API.

So if we start off with a `strict=bool` API, we're stuck with it.

> This has been proposed multiple times on this list:
> 
> a flag that takes three possible values: "shortest" | "longest" | "equal"
> (defaulting to shortest of course). Name to be bikeshed later :-)
> (and enum vs string also to be bikeshed later)

*Named modes* are not typically called flags. For example, the second 
parameter to `open` is called *mode*, not *flag*.

Whether or not a named mode parameter has been proposed before, that's 
not what is being discussed here and now, where we have been explicitly 
debating the two APIs:

- a named function;
- piggy-backing on zip() with a **boolean parameter** taking True and
  False as the switch to control behaviour.

so in the context of the discussion about bool parameters, other APIs 
aren't really relevant. I'm arguing agains the specific `strict=bool` 
API, not other APIs in general. Revamping zip to give it a named mode 
parameter is not my first preference, but it's better than a 
`strict=bool` flag.

However the parameter would have to change:

zip(*iters, strict="shortest")

simply doesn't work.

> > This demonstrates why the "constant flag" is so often an 
> > antipattern. It doesn't scale past two behaviours. Or you end up 
> > with a series of flags:
> >
> > zip(*iterators, strict=False, longest=False, fillvalue=None)
> >
> 
> I don't think anyone proposed an API like that -- yes, that would be horrid.

I do recall someone proposing something similar to that, but I don't 
care enough to trawl through the thread to find it :-)

> There are all sorts of reasons why a ternary flag would not be good, but I
> do think it should be mentioned in the PEP, even if only as a rejected idea.
> 
> But I still like it, 'cause the "flag for two behaviors and another
> function for the third" seem sliek the worse of all options.

*blink*

But that's precisely the option on the table right now!

1. zip_longest remains in itertools;
2. zip remains the default behaviour;
3. zip_strict be implemented as a boolean True/False parameter on zip.

I trust you don't actually mean what you seem to be saying: "this is the 
worst of all options, I'm in favour of it!"

But in any case, I see from a later part of the discussion we're now 
considering a different option:

- treat zip as a namespace, with named callable attributes

which I don't dislike.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MEY2FNXI6NDAOIQLI7KGE7LR5A5GSANH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-07 Thread Kirill Balunov

On Thu, May 7, 2020 at 3:07 AM Christopher Barker 
wrote:

> On Wed, May 6, 2020 at 1:42 PM Kirill Balunov 
> wrote:
>
>> I'm totally agree with everything you said here. From my perspective,
>> comparing three main cases:
>> 1. zip(*iters, strict= (False | True))
>> 2. zip(*iters, mode = ('shortest' | 'equal' | 'longest'))
>> 3. zip_equal(*iters)
>>
>
> Thanks for enumerating these. I think that's helpful so I'll flesh it out
> a bit more. I *think* these are the options on the table:
>
> (note, I'm keeping different names for things as the same option, and in
> no particular order)
>
> 1) No change
> zip(*iters)
> itertools.zip_longest(*iters, fillvalue=None)
>
> 2) Add boolean strict flag to zip
> zip(*iters, strict= (False | True))
> itertools.zip_longest(*iters, fillvalue=None)
>
> 3) Add a ternary mode flag to zip
> zip(*iters, mode = ('shortest' | 'equal' | 'longest'), fillvalue=None)
>
> 4) Add a new function to itertools
> zip(*iters)
> itertools.zip_longest(*iters, fillvalue=None)
> itertools.zip_equal(*iters)
>
>
I think there are two more cases which can be added to the list:

0) No change
zip(*iters)
itertools.zip_longest(*iters, fillvalue=None)
+ add a recipe about zip_equal in itertools docs

6) Add functionality through  zip methods (as proposed in a separate
thread, but maybe it is off topic for the current thread).

-gdg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZBGIGXWL2PZY6H2JWU6UZOPHVBMUE6HW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-07 Thread Paul Moore

On Thu, 7 May 2020 at 01:07, Christopher Barker  wrote:
> 3) Add a ternary mode flag to zip
> zip(*iters, mode = ('shortest' | 'equal' | 'longest'), fillvalue=None)

You missed

itertools.zip_longest(*iters, fillvalue=None)

from this one. Unless you're proposing to drop itertools.zip_longest,
the fact that there's now two ways to do zip_longest seems like an
important wart to point out for this proposal.

For me:

+0.1 on a new function in itertools
+0 on no change
-1 on the remainder

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4JTJQHJICFWNUDM5P4JLDAE7JFTAZDYF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Christopher Barker

On Wed, May 6, 2020 at 1:42 PM Kirill Balunov 
wrote:

> I'm totally agree with everything you said here. From my perspective,
> comparing three main cases:
> 1. zip(*iters, strict= (False | True))
> 2. zip(*iters, mode = ('shortest' | 'equal' | 'longest'))
> 3. zip_equal(*iters)
>

Thanks for enumerating these. I think that's helpful so I'll flesh it out a
bit more. I *think* these are the options on the table:

(note, I'm keeping different names for things as the same option, and in no
particular order)

1) No change
zip(*iters)
itertools.zip_longest(*iters, fillvalue=None)

2) Add boolean strict flag to zip
zip(*iters, strict= (False | True))
itertools.zip_longest(*iters, fillvalue=None)

3) Add a ternary mode flag to zip
zip(*iters, mode = ('shortest' | 'equal' | 'longest'), fillvalue=None)

4) Add a new function to itertools
zip(*iters)
itertools.zip_longest(*iters, fillvalue=None)
itertools.zip_equal(*iters)

Brandt: this might be helpful for the PEP.

For my part, seeing it this way makes me think that (2) adding a strict
flag to zip, while keeping zip_longest on its own in itertools, is the
worst option.
For me:
+1 on the ternary flag
+0.5 on a new function in itertools
-0 on the boolean flag to zip()

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3UYOEZ3ZC25PWCX6QWUUYUJ25ZHNTFZO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Kirill Balunov

On Wed, May 6, 2020 at 6:49 PM Christopher Barker 
wrote:

> On Tue, May 5, 2020 at 5:43 PM Steven D'Aprano 
> wrote:
>
>> Christopher's quoting is kinda messed up and I can't be bothered fixing
>> it, sorry, so you'll just have to guess who said what :-)
>>
>
> Ideally, we are evaluating ideas independently of who expressed them, so
> I'll pretend I did that on purpose :-)
>
> First: really people, it's all been said. I think we all (and I DO include
> myself in that) have fallen into the trap that "if folks don't agree with
> me, I must not have explained myself well enough" -- but in this case, we
> actually do disagree. And not really on the facts, just on the relative
> importance.
>
> But since, I apparently did not explain myself well enough in this case:
> > no -- but we could (and I think should) have a ternary flag, so that
>
>> > zip_longest becomes unnecessary. And we'd never get to eight
>> combinations:
>> > you can't have longest and shortest behavior at the same time!
>>
>> A ternary flag of strict = True, False or what?
>>
>
> ...
>
> a flag that takes three possible values: "shortest" | "longest" | "equal"
> (defaulting to shortest of course). Name to be bikeshed later :-)
> (and enum vs string also to be bikeshed later)
>
> This demonstrates why the "constant flag" is so often an antipattern. It
>> doesn't scale past two behaviours. Or you end up with a series of flags:
>>
>> zip(*iterators, strict=False, longest=False, fillvalue=None)
>>
>
> I don't think anyone proposed an API like that -- yes, that would be
> horrid.
>
> There are all sorts of reasons why a ternary flag would not be good, but I
> do think it should be mentioned in the PEP, even if only as a rejected idea.
>
> But I still like it, 'cause the "flag for two behaviors and another
> function for the third" seem sliek the worse of all options.
>
>
I'm totally agree with everything you said here. From my perspective,
comparing three main cases:
1. zip(*iters, strict= (False | True))
2. zip(*iters, mode = ('shortest' | 'equal' | 'longest'))
3. zip_equal(*iters)

The first case looks like pretty bad idea (maybe it is practical). But
every of the provided cases try to solve the same problem, so from
practical point of view they are all the same. So just as you said the
first case is merely "flag for two behaviors and another function for the
third" -  is solid -1 from my side.

I like how the second case looks and feels, but obviously the proposed
signature (solution) will not be enough. To plug in the existing
functionality from zip_longest you will also need to provide fill= kwarg,
like zip(*iters, mode = ('shortest' | 'equal' | 'longest'), fill=None).
This fill kwarg will be ignored for other to cases  'shortest' and 'equal'.
Heh - it is not perfect. I will give +0.1 for the second case.

The third case is rather usual solution for the real problem (just one more
function in stdlib) - so +0.5 for this case from my side.

-gdg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MW3S3HTQ43WDTNLVA3Z6GBLAOOUEOVZ4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Dan Sommers

On Mon, 27 Apr 2020 13:39:19 -0700
Christopher Barker  wrote:

> There is one "downside" to this in that it potentially leaves the
> iterators passed in in a undetermined state -- partially exhausted,
> and with a longer one having had one more item removed than was
> used. But that exists with "zip_shortest" behavior anyway. But it
> would be a minor reason to do the concertizing approach -- at least
> then you'd know your iterators were fully exhausted.

> SIDE NOTE: this is reminding me that there have been calls in the past
> for an optional __len__ protocol for iterators that are not proper
> sequences, but DO know their length -- maybe one more place to use
> that if it existed.

The C standard library has an ungetc function that, well, "ungets" one
character from the end of a character stream.  The next time the stream
is read, it returns that ungotten character first, and then goes back to
the stream.  Such a feature could solve this problem on infinite streams
without having to concretize them.  (Unless, of course, zip's *caller*
tried to unget an element of the iterator immediately after zip did, but
that's no a likely occurrance.)

Dan

-- 
“Atoms are not things.” – Werner Heisenberg
Dan Sommers, http://www.tombstonezero.net/dan
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CFTJLXF2K35HZURE4BB5NL6KMMBBARBM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Soni L.

On 2020-05-06 12:48 p.m., Christopher Barker wrote:
On Tue, May 5, 2020 at 5:43 PM Steven D'Aprano > wrote:

Christopher's quoting is kinda messed up and I can't be bothered
fixing
it, sorry, so you'll just have to guess who said what :-)

Ideally, we are evaluating ideas independently of who expressed them, 
so I'll pretend I did that on purpose :-)

First: really people, it's all been said. I think we all (and I DO 
include myself in that) have fallen into the trap that "if folks don't 
agree with me, I must not have explained myself well enough" -- but in 
this case, we actually do disagree. And not really on the facts, just 
on the relative importance.

But since, I apparently did not explain myself well enough in this case:
> no -- but we could (and I think should) have a ternary flag, so that

> zip_longest becomes unnecessary. And we'd never get to eight
combinations:
> you can't have longest and shortest behavior at the same time!

A ternary flag of strict = True, False or what?

Come on:

ternary: having three elements, parts, or divisions 
(https://www.merriam-webster.com/dictionary/ternary)

did you really not know that? (and "flag" does not always mean 
"boolean flag", even thoughit often does 
(https://techterms.com/definition/flag) )

(by the way, I'm posting those references because I looked them up to 
make sure I wasn't using terms incorrectly)

This has been proposed multiple times on this list:

a flag that takes three possible values: "shortest" | "longest" | 
"equal" (defaulting to shortest of course). Name to be bikeshed later :-)

(and enum vs string also to be bikeshed later)

how about "length"?

length=True # longest
length=False # shortest (default)
length=None # equal

(altho I still think the "YAGNI function" system would be better >.>)

This demonstrates why the "constant flag" is so often an
antipattern. It
doesn't scale past two behaviours. Or you end up with a series of
flags:

    zip(*iterators, strict=False, longest=False, fillvalue=None)

I don't think anyone proposed an API like that -- yes, that would be 
horrid.

There are all sorts of reasons why a ternary flag would not be good, 
but I do think it should be mentioned in the PEP, even if only as a 
rejected idea.

But I still like it, 'cause the "flag for two behaviors and another 
function for the third" seem sliek the worse of all options.

-CHB

--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LSKG4XDKBEL5DHDWVOVDBGLA3QMF22YH/
Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C7BQYAYC3MF7ZNG5TJPQWG6WDZNDETBE/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Christopher Barker

On Tue, May 5, 2020 at 7:20 AM Steven D'Aprano  wrote:

> On Mon, Apr 27, 2020 at 01:39:19PM -0700, Christopher Barker wrote:
>
> > Can you think of a single case where a zip_equal() (either pre-exisiting
> or
> > roll your own) would not work, but the concretizing version would?
>


> That's easy: if the body of your zip-handling function has side-effects
> which must be atomic (or at least as atomic as Python code will allow).
> An atomic function has to either LBYL (e.g. check the lengths of the
> iterables before starting to zip them), or needs to be able to roll-back
> if a mismatch is found at the end.
>

Good point. but the current "shortest" behavior would be even worse. At
least if it raised you'd get a warning that you made a mess of your data :-)

And yes, that's not an argument against this idea.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PGTZFDBPZ74ORWEZDKFXHS7FSAT74S5N/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Christopher Barker

On Tue, May 5, 2020 at 5:43 PM Steven D'Aprano  wrote:

> Christopher's quoting is kinda messed up and I can't be bothered fixing
> it, sorry, so you'll just have to guess who said what :-)
>

Ideally, we are evaluating ideas independently of who expressed them, so
I'll pretend I did that on purpose :-)

First: really people, it's all been said. I think we all (and I DO include
myself in that) have fallen into the trap that "if folks don't agree with
me, I must not have explained myself well enough" -- but in this case, we
actually do disagree. And not really on the facts, just on the relative
importance.

But since, I apparently did not explain myself well enough in this case:
> no -- but we could (and I think should) have a ternary flag, so that

> > zip_longest becomes unnecessary. And we'd never get to eight
> combinations:
> > you can't have longest and shortest behavior at the same time!
>
> A ternary flag of strict = True, False or what?
>

Come on:

ternary: having three elements, parts, or divisions (
https://www.merriam-webster.com/dictionary/ternary)

did you really not know that? (and "flag" does not always mean "boolean
flag", even thoughit often does (https://techterms.com/definition/flag) )

(by the way, I'm posting those references because I looked them up to make
sure I wasn't using terms incorrectly)

This has been proposed multiple times on this list:

a flag that takes three possible values: "shortest" | "longest" | "equal"
(defaulting to shortest of course). Name to be bikeshed later :-)
(and enum vs string also to be bikeshed later)

This demonstrates why the "constant flag" is so often an antipattern. It
> doesn't scale past two behaviours. Or you end up with a series of flags:
>
> zip(*iterators, strict=False, longest=False, fillvalue=None)
>

I don't think anyone proposed an API like that -- yes, that would be horrid.

There are all sorts of reasons why a ternary flag would not be good, but I
do think it should be mentioned in the PEP, even if only as a rejected idea.

But I still like it, 'cause the "flag for two behaviors and another
function for the third" seem sliek the worse of all options.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LSKG4XDKBEL5DHDWVOVDBGLA3QMF22YH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Adam Johnson

On Mon, 4 May 2020 at 12:41, Steven D'Aprano  wrote:
>
> On Sun, May 03, 2020 at 11:13:58PM -0400, David Mertz wrote:
>
> > It seems to me that a Python implementation of zip_equals() shouldn't do
> > the check in a loop like a version shows (I guess from more-itertools).
> > More obvious is the following, and this has only a small constant speed
> > penalty.
> >
> > def zip_equal(*its):
> > yield from zip(*its)
> > if any(_sentinel == next(o, _sentinel) for o in its):
> > raise ZipLengthError
>
> Alas, that doesn't work, even with your correction of `any` to
> `not all`.
>
> py> list(zip_equal("abc", "xy"))
> [('a', 'x'), ('b', 'y')]
>
>
> The problem here is that zip consumes the "c" from the first iterator,
> exhausting it, so your check at the end finds that all the iterators are
> exhausted.

This got me thinking, what if we were to wrap (or as it turned out,
`chain` on to the end of) each of the individual iterables instead,
thereby performing the relevant check before `zip` fully exhausted
them, something like the following:

```python
def zip_equal(*iterables):
return zip(*_checked_simultaneous_exhaustion(*iterables))

def _checked_simultaneous_exhaustion(*iterables):
if len(iterables) <= 1:
return iterables

def check_others():
# first iterable exhausted, check the others are too
sentinel=object()
if any(next(i, sentinel) is not sentinel for i in iterators):
raise ValueError('unequal length iterables')
if False: yield

def throw():
# one of iterables[1:] exhausted first, therefore it must be shorter
raise ValueError('unequal length iterables')
if False: yield

iterators = tuple(map(iter, iterables[1:]))
return (
itertools.chain(iterables[0], check_others()),
*(itertools.chain(it, throw()) for it in iterators),
)
```

This has the advantage that, if desired, the
`_checked_simultaneous_exhaustion` function could also be reused to
implement a previously mentioned length checking version of `map`.

Going further, if `checked_simultaneous_exhaustion` were to become a
public function (with a better name), it could be used to impose
same-length checking to the iterable arguments of any function,
providing those iterables are consumed in a compatible way.

Additionally, it would allow one to be specific about which iterables
were checked, rather than being forced into the option of checking
either all or none by `zip_equal` / `zip` respectively, thus allowing
us to have our cake and eat it in terms of mixing infinite and
checked-length finite iterables, e.g.

```python
zip(i_am_infinite, *checked_simultaneous_exhaustion(*but_we_are_finite))
# or, if they aren't contiguous
checked1, checked2 = checked_simultaneous_exhaustion(it1, it2)
zip(checked1, infinite, checked2)
```

However, as I previously alluded to, this relies upon the assumption
that each of the given iterators is advanced in turn, in the order
they were provided to `checked_simultaneous_exhaustion`. So -- while
this function would be suitable for use with `zip`, `map`, and any
others which do the same -- if we wanted a more general
`checked_equal_length` function that extended to cases in which the
iterable-consuming function may consume the iterables in some
haphazard order, we'd need something more involved, such as keeping a
running tally of the current length of each iterable and, even then,
we could still only guarantee raising on unequal lengths if the said
function advanced all the given iterators by at least the length of
the shortest.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4D3FIYTOJSROIS3S3SYU752RTOJV27IZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-06 Thread Steven D'Aprano

On Tue, May 05, 2020 at 12:49:05PM -0700, Christopher Barker wrote:

> Agreed, but discoverability is still something to be considered in the API.
> ANd it seems that there are folks arguing that we specifically want this to
> be less discoveble due to concerns of overuse. Which does not seem like
> good API design approach to me.

itertools is one of the most popular and well-known libraries in the 
stdlib. Saying that something is "less discoverable" in intertools 
compared to the builtins is a bit like saying that the Marvel superhero 
franchise is less well-known than the Star Wars franchise. Even if its 
objectively true, its a difference that makes no difference.

In fact I'd even say that there are builtin functions that are less 
well known than itertools, like iter(seq, value) :-)

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FV6MAHWBOBWIUCFKXBWYHAEZU67WHVDO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Steven D'Aprano

Christopher's quoting is kinda messed up and I can't be bothered fixing 
it, sorry, so you'll just have to guess who said what :-)

On Tue, May 05, 2020 at 01:03:30PM -0700, Christopher Barker wrote:

> "If its builtin people will be more likely to use it, so we need to make
> > it builtin."
> >
> > This argument will apply to **literally** every function and class in
> > the standard library.
> 
> 
> But we are not talking adding a new builtin.

I didn't say a *new* builtin. You are talking about having this 
related but distinct functionality piggy-back on top of the 
existing tolerant zip function, distinguishing them by a flag.

I trust you wouldn't try to argue that `int(string, base)` is not 
a builtin function? :-)

> > Firstly, we would have to agree that "maximizing the number of people
> > using the strict version of zip" is our goal. I don't think it is.
> 
> 
> Neither do I. But I am suggesting that "maximizing the number of people
> that need a strict version of zip will use it" Rather than, say, checking
> the length of the inputs before calling zip. Or writing their own version.

Okay, but a function in the std lib is sufficient for that. If you want 
to argue against the alternatives:

- use more-itertools
- make it a recipe in the docs

then "more people will use it" is a good argument for putting it into 
the stdlib. But why should we fear that there will be people doing 
without, or rolling their own, because it's not builtin?

`zip_longest` has been in the stdlib for at least a decade. We know it 
has use-cases, and unlike this strict version of zip the need for this 
was established and proven long ago. If there are people doing without, 
or rolling their own, zip_longest because they either don't know about, 
or cannot be bothered, importing from itertools, should it be in 
builtins too?

> > Why is zip_longest different? What if we want to add a fourth or fifth
> > flavour of zip? Do we then have three flags on zip and have to deal with
> > eight combinations of them?
> >
> 
> no -- but we could (and I think should) have a ternary flag, so that
> zip_longest becomes unnecessary. And we'd never get to eight combinations:
> you can't have longest and shortest behavior at the same time!

A ternary flag of strict = True, False or what?

This demonstrates why the "constant flag" is so often an antipattern. It 
doesn't scale past two behaviours. Or you end up with a series of flags:

zip(*iterators, strict=False, longest=False, fillvalue=None)

and then you have to check for incompatible combinations:

if longest and strict:
raise TypeError('cannot combine these two options')

and it becomes worse the more flags you have.

Or you end up with deprecated parameters:

def zip(*iterators, strict=_sentinel, mode=ZIP_MODES.SHORTEST):
if strict is not _sentinel:
raise DeprecationWarning

> But if we did, then would it be better to have eight separate functions in
> itertools?

You wouldn't have eight separate functions. You would have four. But to 
distinguish four independent modes in a single function, you need three 
flags, and that gives you 2**3 = 8 combinations to deal with, all of 
which have to be checked, and exceptions raised if the combination is 
invalid.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UANJQXQNWQY2C6ADLJO3P2PJ44Q37RGD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Rhodri James


On 05/05/2020 21:03, Christopher Barker wrote:

"If its builtin people will be more likely to use it, so we need to make


it builtin."

This argument will apply to **literally** every function and class in
the standard library.


But we are not talking adding a new builtin.


Well, actually we are.  As Steven pointed out further down the post, 
adding a flag to a function that is pretty much always going to be set 
at compile time is equivalent to (and IMHO would be better expressed as) 
a new function.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YU3TQ7XP5YM2NSUDWV2MUPJY7VLA3OVM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Christopher Barker

"If its builtin people will be more likely to use it, so we need to make

> it builtin."
>
> This argument will apply to **literally** every function and class in
> the standard library.


But we are not talking adding a new builtin.


> Firstly, we would have to agree that "maximizing the number of people
> using the strict version of zip" is our goal. I don't think it is.


Neither do I. But I am suggesting that "maximizing the number of people
that need a strict version of zip will use it" Rather than, say, checking
the length of the inputs before calling zip. Or writing their own version.

Think about the strange discrepency between the three (so far...) kinds

> of zip:
>
> - zip (shortest) is builtin, controlled by a flag;
> - zip strict is builtin, controlled by a flag;
> - zip longest is in a module, with a distinct name.
>
> Why is zip_longest different? What if we want to add a fourth or fifth
> flavour of zip? Do we then have three flags on zip and have to deal with
> eight combinations of them?
>

no -- but we could (and I think should) have a ternary flag, so that
zip_longest becomes unnecessary. And we'd never get to eight combinations:
you can't have longest and shortest behavior at the same time!

But if we did, then would it be better to have eight separate functions in
itertools?

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FWZU546YMOYHGCXFYESZZCITV5LURR6R/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Christopher Barker

On Tue, May 5, 2020 at 9:20 AM David Mertz  wrote:

> I have no idea whether a flag on zip() or a function in itertools would
> get MORE USE.  I *ABSOLUTELY* think it is an anti-goal to get more use for
> its own sake though.
>

I'm not sure anyone was suggesting that -- *maybe* Alex, but I think his
statement was over-interpreted.

 I only want APPROPRIATE USE in any case.
>

I can't imagine we all don't agree with that.

> The real point, to me, is that users who use itertools.zip_strict() will
> use it for exactly the reason that they want that semantics. In contrast, a
> flag for `strict` or `truncate` or `equal` or whatever is a LOT more likely
> to be used in the "just in case" code where the programmer has not thought
> carefully about the semantics they want.
>

There's no way to really know, but I think this is being overblown -- folks
generally don't go use extra flags just for the heck of it -- particularly
one that won't be documented in anything but the latest documents for years
:-)

> The sky isn't falling, I certainly don't think everyone, nor even most
> developers, would use the flag wrong.  But a separate function just
> provides a better, more consistent, API.
>

Well, THAT IS the point of discussion, yes. I disagree, but can see both
points. But I do want folks to consider that having zip() as a builtin, and
zip_strict() and zip_longest() would be in itertools. Which is different
than if they were all in the same namespace (like the various string
methods, for example). Another key point is that if you want zip_longest()
functionality, you simply can not get it with the builtin zip() -- you are
forced to look elsewhere. Whereas most code that might want "strict"
behavior will still work, albeit less safely, with the builtin.

These considerations should be considered in evaluating the API options.

And this is why I personalty think if we add a flag to zip, we should add
one for longest functionality as well, unifying the API.

I don't think anyone in the huge discussion of the walrus operator, for
> example, tried to make the case that the goal should be encouraging it to
> be used AS MUCH AS POSSIBLE.
>

Indeed, the opposite was true: there was a lot of concern that it would be
overused. though I think think that's a much bigger concern than in this
case.

 A feature should be used *where appropriate*, and the design should not
> vacantly simply try to make it more common.
>

Agreed, but discoverability is still something to be considered in the API.
ANd it seems that there are folks arguing that we specifically want this to
be less discoveble due to concerns of overuse. Which does not seem like
good API design approach to me.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NTA24XVRJ7CD4B2BAEEIDVPBRFZY6NEQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Chris Angelico

On Wed, May 6, 2020 at 3:25 AM Steven D'Aprano  wrote:
> Personally, I don't think Chris' backwards-compatibility argument is
> strong. Technically adding a new keyword argument to a function is
> backwards-incompatible, but we normally exclude that sort of change. Who
> writes this?
>
> # This behaviour will be changed by the proposed new parameter.
> zip('', strict=1)  # Raise a type error.
>
> So I think the *backwards incompatibility* argument is weak in that
> regard. But maybe Chris has got a different perspective on this that I
> haven't thought of.

Adding the flag isn't a problem, but merely adding the flag is
useless. (Ditto if a new function is created.) The assumption is that
the flag will be used, changing existing code from zip(x, y) to
zip_strict(x, y) or zip(x, y, strict=True). Either way, it's not the
creation of zip_strict or the addition of the kwonly arg that breaks
backward compat, but the change to (say) the ast module, making use of
this, that will cause problems.

> [Chris]
> > > Should they? I'm not sure how well-supported this actually is. If you
> > > hand-craft an AST and then compile it, is it supposed to catch every
> > > possible malformation?
>
> I would expect that the ast library should accept anything which could
> come from legal Python, and nothing that doesn't.
>

It absolutely should accept anything which could come from legal
Python. The question is, to what extent should it attempt to flag
invalid forms? For example, if you mess up the lineno fields,
attempting to compile that to a code object won't give you a nice
exception. It'll most likely just work, and give weird results in
tracebacks - but there is no legal code that could have produced that.
And what code could produce this?

>>> from ast import *
>>> eval(compile(fix_missing_locations(Expression(body=Set(elts=[]))), "-", 
>>> "eval"))
set()

There is no valid code that can create a Set node with an empty
element list, yet it's perfectly sensible, and when executed, it
produces... an empty set. Exactly like you'd expect. Should it be an
error just because there's no Python code that can create it?

I'm of the opinion that it's okay for it to accept things that
technically can't result from any parse, just as long as there's a
reasonable interpretation of them. Which means that both raising and
compiling silently are valid results here; it's just a question of
whether you consider "ignore the spares" to be a reasonable
interpretation of an odd AST, or if you consider "mismatched lengths"
to be a fundamental error.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FOCSHKCYPEQPVBY2WIE4MHW7QNM5JFEZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Henk-Jaap Wagenaar

On Tue, 5 May 2020, 18:24 Steven D'Aprano,  wrote:

> On Tue, May 05, 2020 at 05:26:02PM +0100, Henk-Jaap Wagenaar wrote:
>
> > This is a straw man in regards to backwards compatibility. This
> particular
> > (sub)thread is about whether if this zip-is-strict either as a separate
> > name or a Boolean flag or some other flag of zip should be a built-in or
> be
> > in e.g. itertools.
>
> Please don't misuse "strawman" in that fashion. A strawman argument is a
> logical fallacy where you attack a weaker position your opponent didn't
> make in order to make your own position stronger. That's not what Chris
> did, and frankly accusing him of strawmanning is a form of "poisoning
> the well".


> What Chris did was to propose a counterfactual to express his opinion on
> this proposal. To paraphrase:
>
> "If this were true (we were designing zip from scratch for the first
> time) then I would agree with the proposal, but since we aren't, I
> disagree because of these reasons."
>
> That is a perfectly legitimate position to take.
>

I agree on the face of it (in regards to strawmanning and your
paraphrasing), except I wasn't disagreeing with anything you've gone into
the detail above, but I disagreed with one of the reasons listed and
thought it was strawmanning, namely the "the backward compatibility break
large" (see further down, why).


>
> "If we weren't in lockdown, I would take you out for dinner at a
> restaurant, but since we are in quarantine, I don't think we
> should go out."
>
> Personally, I don't think Chris' backwards-compatibility argument is
> strong. Technically adding a new keyword argument to a function is
> backwards-incompatible, but we normally exclude that sort of change. Who
> writes this?
>
> # This behaviour will be changed by the proposed new parameter.
> zip('', strict=1)  # Raise a type error.
>
> So I think the *backwards incompatibility* argument is weak in that
> regard. But maybe Chris has got a different perspective on this that I
> haven't thought of.
>
>
I cannot interpret that as a "large" break as Chris says, so I must assume
he meant something else (changing the default is my assumption) unless
somebody (Chris or otherwise) can tell me why adding a keyword argument
would be a large incompatible change?


>
> [Chris]
> > > Should they? I'm not sure how well-supported this actually is. If you
> > > hand-craft an AST and then compile it, is it supposed to catch every
> > > possible malformation?
>
> I would expect that the ast library should accept anything which could
> come from legal Python, and nothing that doesn't.
>
>
> --
> Steven
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/GQZLWLOHFPBQLADHYLHW6JYY2X4S4ABA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/R4EV4JWACQDCSGBMBXWQSNYTNAJWU6LH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Steven D'Aprano

On Tue, May 05, 2020 at 05:28:08PM +0100, Henk-Jaap Wagenaar wrote:
> But you care about your input, you can do so by setting strict=True (if
> that's the road we go down), and unlike what others have said, the IDE I
> use (pycharm) would tell me that flag exists as I type "zip" and so I'd be
> more likely to use it than if it was in itertools/...

We keep coming to this same old argument over and over.

"If its builtin people will be more likely to use it, so we need to make 
it builtin."

This argument will apply to **literally** every function and class in 
the standard library. Pick any arbitrary module, any function from that 
module: `imghdr.what`. I literally didn't even know that function 
existed until 30 seconds ago. If it had been a builtin, I would have 
known about it years ago, and would have been more likely to use it.

All this is true. But it's not an argument in favour of making it a 
builtin. (Or at least not a *good* argument.)

Firstly, we would have to agree that "maximizing the number of people 
using the strict version of zip" is our goal. I don't think it is. We 
don't try to maximize the number of people using `imghdr.what` -- it is 
there for those who need it, but we're not trying to push people to use 
it whether they need it or not.

And secondly, that assumes that the benefit gained is greater than the 
cost in making the builtins more complicated. It now has two functions 
with the same name, `zip`, distinguished by a runtime flag, even though 
that flag will nearly always be given as a compile-time constant:

# Almost always specified as a compile-time constant:
zip(..., strict=True)

# Almost never as a runtime variable:
flag = settings['zip'].tolerant
zip(..., strict=not flag)

That "compile-time constant" suggests that, absent some compelling 
reason, the two functions ought to be split into separate named 
functions. "But my IDE..." is not a compelling reason.

This is not a hard law, but it is a strong principle. Compile-time 
constants to swap from two distinct modes often make for poor APIs, and 
we should be reluctant to design our functions that way if we can avoid 
it. (Sometimes we can't -- but this is not one of those times.)

Think about the strange discrepency between the three (so far...) kinds 
of zip:

- zip (shortest) is builtin, controlled by a flag;
- zip strict is builtin, controlled by a flag;
- zip longest is in a module, with a distinct name.

Why is zip_longest different? What if we want to add a fourth or fifth 
flavour of zip? Do we then have three flags on zip and have to deal with 
eight combinations of them?

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2W5GF37T3T6XABYU6G5LOHUEJFLIGIBT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Steven D'Aprano

On Tue, May 05, 2020 at 05:26:02PM +0100, Henk-Jaap Wagenaar wrote:

> This is a straw man in regards to backwards compatibility. This particular
> (sub)thread is about whether if this zip-is-strict either as a separate
> name or a Boolean flag or some other flag of zip should be a built-in or be
> in e.g. itertools.

Please don't misuse "strawman" in that fashion. A strawman argument is a 
logical fallacy where you attack a weaker position your opponent didn't 
make in order to make your own position stronger. That's not what Chris 
did, and frankly accusing him of strawmanning is a form of "poisoning 
the well".

What Chris did was to propose a counterfactual to express his opinion on 
this proposal. To paraphrase:

"If this were true (we were designing zip from scratch for the first 
time) then I would agree with the proposal, but since we aren't, I 
disagree because of these reasons."

That is a perfectly legitimate position to take.

"If we weren't in lockdown, I would take you out for dinner at a 
restaurant, but since we are in quarantine, I don't think we 
should go out."

Personally, I don't think Chris' backwards-compatibility argument is 
strong. Technically adding a new keyword argument to a function is 
backwards-incompatible, but we normally exclude that sort of change. Who 
writes this?

# This behaviour will be changed by the proposed new parameter.
zip('', strict=1)  # Raise a type error.

So I think the *backwards incompatibility* argument is weak in that 
regard. But maybe Chris has got a different perspective on this that I 
haven't thought of.

[Chris]
> > Should they? I'm not sure how well-supported this actually is. If you
> > hand-craft an AST and then compile it, is it supposed to catch every
> > possible malformation?

I would expect that the ast library should accept anything which could 
come from legal Python, and nothing that doesn't. 

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GQZLWLOHFPBQLADHYLHW6JYY2X4S4ABA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Rhodri James


On 05/05/2020 17:26, Henk-Jaap Wagenaar wrote:

This is a straw man in regards to backwards compatibility. This particular
(sub)thread is about whether if this zip-is-strict either as a separate
name or a Boolean flag or some other flag of zip should be a built-in or be
in e.g. itertools.

It is not about breaking backwards compatibility (presumably by making it
the default behaviour of zip).


Except that that's part of the thinking involved in choosing a flag 
instead of the usual new function.  No one (I think) is claiming that we 
should break backwards compatibility and default to strict=True, but 
having the flag is a strong statement that length-checking is an 
intrinsic part of zipping.  I don't believe that's true, and in 
consequence I think adding a flag is a mistake.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5GURR7YTHRY43KSM5LW7U665KGR24MUA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Henk-Jaap Wagenaar

But you care about your input, you can do so by setting strict=True (if
that's the road we go down), and unlike what others have said, the IDE I
use (pycharm) would tell me that flag exists as I type "zip" and so I'd be
more likely to use it than if it was in itertools/...

On Tue, 5 May 2020, 16:41 Rhodri James,  wrote:

> On 05/05/2020 13:53, Henk-Jaap Wagenaar wrote:
> > Brandt's example with ast in the stdlib I think is a pretty good example
> of
> > this.
> >
> > On Tue, 5 May 2020 at 13:27, Rhodri James  wrote:
> >
> >> On 05/05/2020 13:12, Henk-Jaap Wagenaar wrote:
> >>> A function that is a "safer" version in some "edge case" (not extra
> >>> functionality but better error handling basically) but that does
> >> otherwise
> >>> work as expected is not something one will search for automatically.
> This
> >>> is zip versus zip-with-strict-true.
> >>
> >> I'm sorry, I don't buy it.  This isn't an edge case, it's all about
> >> whether you care about what your input is.  In that sense, it's exactly
> >> like the relationship between zip and zip_longest.
>
> Interesting, because I'd call it a counterexample to your point.  The
> bug's authors should have cared about their input, but didn't.
>
> --
> Rhodri James *-* Kynesim Ltd
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IBSSGPEVGBDAQFKDEIDSAEPT26YBYMRP/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Henk-Jaap Wagenaar

This is a straw man in regards to backwards compatibility. This particular
(sub)thread is about whether if this zip-is-strict either as a separate
name or a Boolean flag or some other flag of zip should be a built-in or be
in e.g. itertools.

It is not about breaking backwards compatibility (presumably by making it
the default behaviour of zip).

On Tue, 5 May 2020, 17:17 Chris Angelico,  wrote:

> On Wed, May 6, 2020 at 1:44 AM Rhodri James  wrote:
> >
> > On 05/05/2020 13:53, Henk-Jaap Wagenaar wrote:
> > > Brandt's example with ast in the stdlib I think is a pretty good
> example of
> > > this.
> > >
> > > On Tue, 5 May 2020 at 13:27, Rhodri James 
> wrote:
> > >
> > >> On 05/05/2020 13:12, Henk-Jaap Wagenaar wrote:
> > >>> A function that is a "safer" version in some "edge case" (not extra
> > >>> functionality but better error handling basically) but that does
> > >> otherwise
> > >>> work as expected is not something one will search for automatically.
> This
> > >>> is zip versus zip-with-strict-true.
> > >>
> > >> I'm sorry, I don't buy it.  This isn't an edge case, it's all about
> > >> whether you care about what your input is.  In that sense, it's
> exactly
> > >> like the relationship between zip and zip_longest.
> >
> > Interesting, because I'd call it a counterexample to your point.  The
> > bug's authors should have cared about their input, but didn't.
> >
>
> Should they? I'm not sure how well-supported this actually is. If you
> hand-craft an AST and then compile it, is it supposed to catch every
> possible malformation? Has Python ever made any promises about
> *anything* regarding manual creation of AST nodes? Maybe it would be
> *nice* if it noticed the bug for you, but if you're messing around
> with this sort of thing, it's not that unreasonable to expect you to
> get your inputs right.
>
> If you're creating a language from scratch and want to have separate
> "strict" and "truncating" forms of zip, then by all means, go ahead.
> But I think the advantage here is marginal and the backward
> compatibility break large.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/3JKQREPI4CE2ZEB75URDQMGKEWHJEJVO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CZX6FMFGUAPXZYKJL3C6L2P3YOSGBAAA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread David Mertz

I have no idea whether a flag on zip() or a function in itertools would get
MORE USE.  I *ABSOLUTELY* think it is an anti-goal to get more use for its
own sake though.

I'm +1 on a new function in itertools, maybe +0 or maybe -0 on a flag.  But
I only want APPROPRIATE USE in any case.  The API conventions of Python in
general very strongly favor a new function.  Yes, not everything in Python
naming is consistent, and counter examples for any idea can surely be
found.  But it just is a lot less surprising to users to follow the
predominant pattern that zip_longest(), for example, follows.

The real point, to me, is that users who use itertools.zip_strict() will
use it for exactly the reason that they want that semantics. In contrast, a
flag for `strict` or `truncate` or `equal` or whatever is a LOT more likely
to be used in the "just in case" code where the programmer has not thought
carefully about the semantics they want.  The sky isn't falling, I
certainly don't think everyone, nor even most developers, would use the
flag wrong.  But a separate function just provides a better, more
consistent, API.

I don't think anyone in the huge discussion of the walrus operator, for
example, tried to make the case that the goal should be encouraging it to
be used AS MUCH AS POSSIBLE.  Nor likewise for any other new feature.  A
feature should be used *where appropriate*, and the design should not
vacantly simply try to make it more common.

-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VZ6HBRWZ4ZFBRYTSVEKRG4PMFNFUEGM6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Chris Angelico

On Wed, May 6, 2020 at 1:44 AM Rhodri James  wrote:
>
> On 05/05/2020 13:53, Henk-Jaap Wagenaar wrote:
> > Brandt's example with ast in the stdlib I think is a pretty good example of
> > this.
> >
> > On Tue, 5 May 2020 at 13:27, Rhodri James  wrote:
> >
> >> On 05/05/2020 13:12, Henk-Jaap Wagenaar wrote:
> >>> A function that is a "safer" version in some "edge case" (not extra
> >>> functionality but better error handling basically) but that does
> >> otherwise
> >>> work as expected is not something one will search for automatically. This
> >>> is zip versus zip-with-strict-true.
> >>
> >> I'm sorry, I don't buy it.  This isn't an edge case, it's all about
> >> whether you care about what your input is.  In that sense, it's exactly
> >> like the relationship between zip and zip_longest.
>
> Interesting, because I'd call it a counterexample to your point.  The
> bug's authors should have cared about their input, but didn't.
>

Should they? I'm not sure how well-supported this actually is. If you
hand-craft an AST and then compile it, is it supposed to catch every
possible malformation? Has Python ever made any promises about
*anything* regarding manual creation of AST nodes? Maybe it would be
*nice* if it noticed the bug for you, but if you're messing around
with this sort of thing, it's not that unreasonable to expect you to
get your inputs right.

If you're creating a language from scratch and want to have separate
"strict" and "truncating" forms of zip, then by all means, go ahead.
But I think the advantage here is marginal and the backward
compatibility break large.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3JKQREPI4CE2ZEB75URDQMGKEWHJEJVO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Rhodri James


On 05/05/2020 13:53, Henk-Jaap Wagenaar wrote:

Brandt's example with ast in the stdlib I think is a pretty good example of
this.

On Tue, 5 May 2020 at 13:27, Rhodri James  wrote:


On 05/05/2020 13:12, Henk-Jaap Wagenaar wrote:

A function that is a "safer" version in some "edge case" (not extra
functionality but better error handling basically) but that does

otherwise

work as expected is not something one will search for automatically. This
is zip versus zip-with-strict-true.


I'm sorry, I don't buy it.  This isn't an edge case, it's all about
whether you care about what your input is.  In that sense, it's exactly
like the relationship between zip and zip_longest.


Interesting, because I'd call it a counterexample to your point.  The 
bug's authors should have cared about their input, but didn't.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OBJAU2776PJ5MAYUPNNHP6ZAVV53BC7E/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Steven D'Aprano

On Mon, Apr 27, 2020 at 01:39:19PM -0700, Christopher Barker wrote:

> Can you think of a single case where a zip_equal() (either pre-exisiting or
> roll your own) would not work, but the concretizing version would?

That's easy: if the body of your zip-handling function has side-effects 
which must be atomic (or at least as atomic as Python code will allow). 
An atomic function has to either LBYL (e.g. check the lengths of the 
iterables before starting to zip them), or needs to be able to roll-back 
if a mismatch is found at the end.

In the most general case, we can't roll-back easily, or at all, so if 
your requirements are to avoid partial operations, then you must 
concretize the input streams and check their lengths.

But I don't think that's a problem for this proposal. Nobody is saying 
that zip_strict can solve all problems, and we shouldn't hold it against 
it that it doesn't solve the atomicity problem (which is very hard to 
solve in Python).

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DZ3WZGWIXS4XJ563BTOW6GMXSVKSETYZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Henk-Jaap Wagenaar

Brandt's example with ast in the stdlib I think is a pretty good example of
this.

On Tue, 5 May 2020 at 13:27, Rhodri James  wrote:

> On 05/05/2020 13:12, Henk-Jaap Wagenaar wrote:
> > A function that is a "safer" version in some "edge case" (not extra
> > functionality but better error handling basically) but that does
> otherwise
> > work as expected is not something one will search for automatically. This
> > is zip versus zip-with-strict-true.
>
> I'm sorry, I don't buy it.  This isn't an edge case, it's all about
> whether you care about what your input is.  In that sense, it's exactly
> like the relationship between zip and zip_longest.
>
> --
> Rhodri James *-* Kynesim Ltd
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/U47NZNW5DIZLW34UTNEFYQ3ZCRW57EMU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TVK7DRO4LNEAH3PAB5IJB5DI6TOBJ3TJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Rhodri James


On 05/05/2020 13:12, Henk-Jaap Wagenaar wrote:

A function that is a "safer" version in some "edge case" (not extra
functionality but better error handling basically) but that does otherwise
work as expected is not something one will search for automatically. This
is zip versus zip-with-strict-true.


I'm sorry, I don't buy it.  This isn't an edge case, it's all about 
whether you care about what your input is.  In that sense, it's exactly 
like the relationship between zip and zip_longest.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/U47NZNW5DIZLW34UTNEFYQ3ZCRW57EMU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Henk-Jaap Wagenaar

I feel like that argument is flawed. I cannot think of another good example
(I am sure there are plenty!) but there is a big difference for
discoverability between:

A function that does something *different* and functionality does not exist
in a built-in (or whatever namespace you are considering). For example,
zip_longest v.s. zip: if you have know/expect one of your iterators to run
out early, but do not wish the zipping to end, normal zip won't do and so
you will end up searching for an alternative.

A function that is a "safer" version in some "edge case" (not extra
functionality but better error handling basically) but that does otherwise
work as expected is not something one will search for automatically. This
is zip versus zip-with-strict-true.

I did not phrase that particularly well, but I am hoping people get the
gist/can rephrase it better.

On Tue, 5 May 2020 at 11:34, Paul Moore  wrote:

> On Tue, 5 May 2020 at 07:22, Christopher Barker 
> wrote:
>
> > In any case, you seem to making the argument that a few of us are
> putting forward: yes, a flag on zip() is likely to get more use than a
> function in itertools. Thanks for the support :-)
>
> I'd like to add my voice to the people saying that if someone isn't
> willing to go and find the correct function, and import it from the
> correct module, to implement the behaviour that they want, then I have
> no interest in making it easier for them to write their code
> correctly, because they seem to have very little interest in
> correctness. Can someone come up with any sort of credible argument
> that someone who's trying to write their code correctly would be in
> any way inconvenienced by having to get the functionality from
> itertools?
>
> It seems like we're trying to design a way for people to
> "accidentally" write correct code without trying to, and without
> understanding what could go wrong if they use the current zip
> function. I'm OK with "make it easy to do the right thing", but "make
> it easy to do the right thing by accident" is a step too far IMO.
>
> Paul
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/E3C4GFR6XVFJFOPTKQX4VI647HHBJVYC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZFEBTPTOX4TMI42OGBSVQ37AMP3ZFFJQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Paul Moore

On Tue, 5 May 2020 at 07:22, Christopher Barker  wrote:

> In any case, you seem to making the argument that a few of us are putting 
> forward: yes, a flag on zip() is likely to get more use than a function in 
> itertools. Thanks for the support :-)

I'd like to add my voice to the people saying that if someone isn't
willing to go and find the correct function, and import it from the
correct module, to implement the behaviour that they want, then I have
no interest in making it easier for them to write their code
correctly, because they seem to have very little interest in
correctness. Can someone come up with any sort of credible argument
that someone who's trying to write their code correctly would be in
any way inconvenienced by having to get the functionality from
itertools?

It seems like we're trying to design a way for people to
"accidentally" write correct code without trying to, and without
understanding what could go wrong if they use the current zip
function. I'm OK with "make it easy to do the right thing", but "make
it easy to do the right thing by accident" is a step too far IMO.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/E3C4GFR6XVFJFOPTKQX4VI647HHBJVYC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-05 Thread Christopher Barker

Steven: I understand that Alex said he thought that putting "strict" in as
a flag would make it a bit more likely that people would use, and that he
thinks that's a good thing, and  you think that's a bad thing, but...

Unless we were to make it the default behavior, very few people are going
to be adding this flag "just in case".

And the fact that you think that making it a flag will make it more likely
that folks will use it is an argument for making it a flag. Unless you
don't like the idea at all, and want it to be more obscure and hard to find.

In any case, you seem to making the argument that a few of us are putting
forward: yes, a flag on zip() is likely to get more use than a function in
itertools. Thanks for the support :-)

However, you also seem to be making the argument that this feature would do
more harm than good. I disagree, but if that's what you think, then it
shouldn't be added at all, so please make that case, rather than arguing
for adding it, but making it harder to find.

-CHB



On Mon, May 4, 2020 at 6:54 PM Steven D'Aprano  wrote:

> On Mon, May 04, 2020 at 09:20:28PM +0200, Alex Hall wrote:
>
> > > Seriously, if some object defines a weird `__eq__` then half the
> > > standard library, including builtins, stops working "correctly". See
> for
> > > example the behaviour of float NANs in lists.
> > >
> > > My care factor for this is negligible, until such time that it is
> proven
> > > to be an issue for real objects in real code. Until then, YAGNI.
> >
> > Here is an example:
>
> Alex, I understand the point you are trying to make, and I got the
> reference to numpy the first time you referenced it. I just don't care
> about it. As far as I am concerned, numpy array's equality behaviour is
> even more broken than float NANs, and it's not the stdlib's
> responsibility to guarantee "correctness" (for some definition thereof)
> if you use broken classes in your data -- especially not for something
> of marginal value as "zip_strict", as you admitted yourself:
>
> "The problem is that no one really *needs* this check. You *can* do
> without it."
>
> Right. So it's a "nice-to-have", not an essential function, and it can
> go into intertools. The itertools implementer can decide for themselves
> whether they care to provide a C accelerated version as well as a
> Python version from Day 1, or even whether a recipe is enough.
>
> My point here is entirely that we shouldn't feel ourselves forced into
> *premptively* providing a C version, let alone making this a builtin,
> just because `x in y` breaks if one of the elements of y is a numpy
> array. numpy itself doesn't need this function, they do their own length
> checks.
>
>
> --
> Steven
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZIQZLTKRMGSCELSWZ2EQCW24CZ44B4MG/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3EHXR3JUAKXYQEQZ2WRYHZBXKQD25SGZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Ethan Furman


On 05/04/2020 12:07 PM, Alex Hall wrote:


No, I stand by my position for now that "just in case" is a genuine reason


"just in case" is boiler-plate.  One of the huge wins in Python is its low 
boiler-plate requirements.  It's okay if programmers have to think about their code and 
what's required, and it's even more okay to import the correct functions from a module -- 
especially if that module is already in the stdlib.

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3N6INFDSLLOUNW7E3HDATMZOZHHZXF5K/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Ethan Furman


On 05/04/2020 12:35 PM, Alex Hall wrote:


I imagine there are few people who are too lazy to copy code from SO right now, 
and would be too lazy to import from itertools when the feature becomes 
available, but if it's a builtin then they're willing to make changes


Quite frankly, I have zero concern for people who are unwilling to import the 
correct function from the correct module.

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AKNVIOLBHN2T57QKAFZHFNQJXVBMNSRD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Steven D'Aprano

On Mon, May 04, 2020 at 09:20:28PM +0200, Alex Hall wrote:

> > Seriously, if some object defines a weird `__eq__` then half the
> > standard library, including builtins, stops working "correctly". See for
> > example the behaviour of float NANs in lists.
> >
> > My care factor for this is negligible, until such time that it is proven
> > to be an issue for real objects in real code. Until then, YAGNI.
> 
> Here is an example:

Alex, I understand the point you are trying to make, and I got the 
reference to numpy the first time you referenced it. I just don't care 
about it. As far as I am concerned, numpy array's equality behaviour is 
even more broken than float NANs, and it's not the stdlib's 
responsibility to guarantee "correctness" (for some definition thereof) 
if you use broken classes in your data -- especially not for something 
of marginal value as "zip_strict", as you admitted yourself:

"The problem is that no one really *needs* this check. You *can* do 
without it."

Right. So it's a "nice-to-have", not an essential function, and it can 
go into intertools. The itertools implementer can decide for themselves 
whether they care to provide a C accelerated version as well as a 
Python version from Day 1, or even whether a recipe is enough.

My point here is entirely that we shouldn't feel ourselves forced into 
*premptively* providing a C version, let alone making this a builtin, 
just because `x in y` breaks if one of the elements of y is a numpy 
array. numpy itself doesn't need this function, they do their own length 
checks.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZIQZLTKRMGSCELSWZ2EQCW24CZ44B4MG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Dan Sommers

On Mon, 4 May 2020 21:07:26 +0200
Alex Hall  wrote:

> Yes. We're starting to go in circles here, but I'm arguing that it's
> OK for people to be mildly inconvenienced sometimes having to
> preemptively trim their inputs in exchange for less confusing,
> invisible, frustrating bugs.  I'd like people to use this feature as
> often as possible, and I think the benefits easily outweigh the
> problem you describe. Going crazy trying to debug something is
> probably the thing programmers complain about the most, I'd like to
> reduce that.

[...]

> If an API accepts some iterables intending to zip them, I feel pretty
> safe guessing that 90% of the users of that API will pass iterables
> that they intend to be of equal length. Occasionally someone might
> want to pass an infinite stream or something, but really most users
> will just use lists constructed in a boring manner. I can't imagine
> ever designing an API thinking "I'd better not make this strict, I'm
> sure this particular API will be used quite differently from most
> other similar APIs and users will want to pass different lengths
> unusually often". But even if I grant that such occasions exist, I see
> no reason to believe that they will occur most often when a user is
> feeling too lazy to import itertools. The correlation you propose is
> highly suspect.

Is a Warning the right compromise?  Turn it on by default, and let that
10% (where did that number come from?) turn if off because they actually
do know better.

Dan

-- 
“Atoms are not things.” – Werner Heisenberg
Dan Sommers, http://www.tombstonezero.net/dan
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H2BZ5IT7FHSSWIPSZ5AHXJJHZKXX3PL5/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Alex Hall

On Mon, May 4, 2020 at 9:18 PM Chris Angelico  wrote:

> On Tue, May 5, 2020 at 5:11 AM Alex Hall  wrote:
> >
> > On Mon, May 4, 2020 at 1:59 AM Steven D'Aprano 
> wrote:
> >> Can we agree to meet half-way?
> >>
> >> - there are legitimate, genuine uses for zip_strict;
> >>
> >> - but encouraging people to change zip to zip_strict "just in case"
> >>   in the absence of a genuine reason is a bad thing.
> >
> > No, I stand by my position for now that "just in case" is a genuine
> reason and that safety outweighs convenience and efficiency. I haven't been
> given a reason to believe that your concerns would be significant.
> >
>
> If we were at the very beginning of the zip() function's life, and
> could guide its future without any baggage from the past, then "just
> in case" might be a valid justification. But that's not where we are.
> Is "just in case" worth the likely break of backward compatibility? I
> say "likely" because, technically, the Python language and standard
> library would be backward compatible; but in order to get any benefit
> from this change, there would need to be places where the strict mode
> is used, and that's going to mean people change code to be more
> strict, "just to be safe". Is THAT worth it?
>
> ChrisA

I imagine there are few people who are too lazy to copy code from SO right
now, and would be too lazy to import from itertools when the feature
becomes available, but if it's a builtin then they're willing to make
changes to some old code to make it a bit safer, even though that would
break backward compatibility in their library. That's a weird combination.

And if that happens, the user still gets a clear exception which tells them
what to do. That's not a bad experience when making an upgrade. There's
likely to be other breaking changes too, the library just dropped support
for Python 3.9.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FL75WVGLO4P422KHZ6ZRJOFXS7LKCWHD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Alex Hall

On Mon, May 4, 2020 at 2:33 AM Steven D'Aprano  wrote:

> On Sat, May 02, 2020 at 07:43:44PM +0200, Alex Hall wrote:
> > On Sat, May 2, 2020 at 6:09 PM Steven D'Aprano 
> wrote:
> >
> > > On Sat, May 02, 2020 at 04:58:43PM +0200, Alex Hall wrote:
> > >
> > > > I didn't think carefully about this implementation and thought that
> there
> > > > was only a performance cost in the error case. That's obviously not
> true
> > > -
> > > > there's an `if` statement executed in Python for every item in every
> > > > iterable.
> > >
> > > Sorry, this does not demonstrate that the performance cost is
> > > significant.
> > >
> > > This adds one "if" per loop, terminating on (one more than) the
> shortest
> > > input. So O(N) on the length of the input. That's usually considered
> > > reasonable, provided the per item cost is low.
> > >
> > > The test in the "if" is technically O(N) on the number of input
> > > iterators, but since that's usually two, and rarely more than a
> handful,
> > > it's close enough to a fixed cost.
> > >
> > > On my old and slow PC `sentinel in combo` is quite fast:
> > >
> >
> > `sentinel in combo` is problematic if some values have overridden
> `__eq__`.
> > I referred to this problem in a previous email to you, saying that people
> > had copied this buggy implementation from SO and that it still hadn't
> been
> > fixed after being pointed out. The fact that you missed this helps to
> prove
> > my point. Getting this right is hard.
>
> I didn't miss it, I ignored it as YAGNI.
>
> Seriously, if some object defines a weird `__eq__` then half the
> standard library, including builtins, stops working "correctly". See for
> example the behaviour of float NANs in lists.
>
> My care factor for this is negligible, until such time that it is proven
> to be an issue for real objects in real code. Until then, YAGNI.
>

Here is an example:

```
import numpy as np

from itertools import zip_longest


def zip_equal(*iterables):
sentinel = object()
for combo in zip_longest(*iterables, fillvalue=sentinel):
if sentinel in combo:
raise ValueError('Iterables have different lengths')
yield combo


arr = np.arange(8).reshape((2, 2, 2))
print(arr)
print(list(zip(*arr)))
print(list(zip_equal(*arr)))
```

The output:

```
[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]
[(array([0, 1]), array([4, 5])), (array([2, 3]), array([6, 7]))]
Traceback (most recent call last):
  File
"/home/alex/.config/JetBrains/PyCharm2020.1/scratches/scratch_666.py", line
15, in 
print(list(zip_equal(*arr)))
  File
"/home/alex/.config/JetBrains/PyCharm2020.1/scratches/scratch_666.py", line
8, in zip_equal
if sentinel in combo:
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
```

I know for a fact that this would confuse people badly because I've seen
multiple people who know what this error message generally refers to
incorrectly identify where exactly it's coming from in a similar case:
https://stackoverflow.com/questions/60780328/python-valueerror-the-truth-value-of-an-array-with-more-than-one-element-is-amb/60780361
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WUBRVEBJZVPPOYLIDFVWVQ5AHPMT7HZV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Chris Angelico

On Tue, May 5, 2020 at 5:11 AM Alex Hall  wrote:
>
> On Mon, May 4, 2020 at 1:59 AM Steven D'Aprano  wrote:
>> Can we agree to meet half-way?
>>
>> - there are legitimate, genuine uses for zip_strict;
>>
>> - but encouraging people to change zip to zip_strict "just in case"
>>   in the absence of a genuine reason is a bad thing.
>
> No, I stand by my position for now that "just in case" is a genuine reason 
> and that safety outweighs convenience and efficiency. I haven't been given a 
> reason to believe that your concerns would be significant.
>

If we were at the very beginning of the zip() function's life, and
could guide its future without any baggage from the past, then "just
in case" might be a valid justification. But that's not where we are.
Is "just in case" worth the likely break of backward compatibility? I
say "likely" because, technically, the Python language and standard
library would be backward compatible; but in order to get any benefit
from this change, there would need to be places where the strict mode
is used, and that's going to mean people change code to be more
strict, "just to be safe". Is THAT worth it?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ENS7U6A2RDCYUDC3T5FW2GFC5PGZWXPC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Alex Hall

On Mon, May 4, 2020 at 1:59 AM Steven D'Aprano  wrote:

> On Sat, May 02, 2020 at 10:26:18PM +0200, Alex Hall wrote:
>
> > > If I know that consumers of my data truncate on the shortest input,
> then
> > > as the producer of data I don't have to care about making them equal. I
> > > can say:
> > >
> > > process_user_ids(usernames, generate_user_ids())
> > >
> > > and pass an infinite stream of user IDs and know that the function will
> > > just truncate on the shortest stream. Yay! Life is good.
> > >
> > > But now this zip flag comes along, and the author of process_user_ids
> > > decides to protect me from myself and "fail loudly", and I will curse
> > > them onto the hundredth generation for making my life more difficult.
> > >
> >
> > My guess is that this kind of situation is rare and unusual. The example
> > looks made up, is it based on something real?
>
> Yes, its made up, but yes, it is based on real code. I haven't had to
> generate user IDs yet, but I have often passed in (for example) an
> infinite stream of prime numbers, or some other infinite sequence.
>
> Not necessarily infinite either: it might be finite but huge, such as
> combinations or permutations of something.
>

Right, but have you had an actual situation where you've passed two
parallel streams of different lengths to an external library that zipped
them?

> At the consumer end, the main one that comes to mind off the top of my
> head (apart from code I wrote myself) is numpy, for example their
> coefficent correlation function:
>
> py> np.corrcoef([1, 2, 3, 4, 5, 6, 7, 8], [7, 6, 5, 4, 3, 2])
> Traceback (most recent call last):
>   ...
> ValueError: array dimensions must agree except for d_0
>
> Now I'm still keeping an open-mind whether this check is justified for
> stats functions. (There have been some requests for XY stats in the
> statistics module, so I may have to make a decision some day.)
>

Are you saying you're annoyed that they enforce the same length here, and
you'd like the freedom to pass different lengths? Because to me the check
seems like an extremely good idea.

> But my point is, regardless of whether that check is necessary or not,
> *I still have to check it when I produce the data* or else I get a
> potentially spurious exception that will eat my data. In the general
> case where I have iterators not lists, no recovery is possible. I can't
> catch the error, trim the data, and resubmit. I have to preemptively
> trim the data.
>
> Of course I recognise the right of each developer to choose for
> themselves whether to enforce the rule that input streams are equal. And
> sometimes that will be the right thing to do.
>
> But you are arguing that putting zip_strict as a builtin will encourage
> people to do this, and I am saying that *encouraging people to do this*
> is a point against it, because that will lead to the "things which
> aren't errors should never pass silently" Just In Case it might be an
> error.
>

Yes. We're starting to go in circles here, but I'm arguing that it's OK for
people to be mildly inconvenienced sometimes having to preemptively trim
their inputs in exchange for less confusing, invisible, frustrating bugs.
I'd like people to use this feature as often as possible, and I think the
benefits easily outweigh the problem you describe. Going crazy trying to
debug something is probably the thing programmers complain about the most,
I'd like to reduce that.

> If you need to enforce equal length input, then one extra line:
>
> from itertools import zip_strict
>
> is no burden. But if someone is on the fence about checking for equal
> lengths, and importing would be too much trouble so they don't bother,
> but making it builtin is enough for them to tip them over into using
> it "just to be safe", then they probably shouldn't be using it.
>

I don't know why you put "just to be safe" in mocking quotes. I see safety
as good.

If an API accepts some iterables intending to zip them, I feel pretty safe
guessing that 90% of the users of that API will pass iterables that they
intend to be of equal length. Occasionally someone might want to pass an
infinite stream or something, but really most users will just use lists
constructed in a boring manner. I can't imagine ever designing an API
thinking "I'd better not make this strict, I'm sure this particular API
will be used quite differently from most other similar APIs and users will
want to pass different lengths unusually often". But even if I grant that
such occasions exist, I see no reason to believe that they will occur most
often when a user is feeling too lazy to import itertools. The correlation
you propose is highly suspect.

> > An external API which
> > requires you to pass parallel iterables instead of pairs is unusual and
> > confusing.
>
> I don't know about that. numpy's correlation coefficient supports both a
> single array of data pairs or a pair of separate X, Y values. So does R.
>
> Spreadsheets likewise

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-04 Thread Steven D'Aprano

On Sun, May 03, 2020 at 11:13:58PM -0400, David Mertz wrote:

> It seems to me that a Python implementation of zip_equals() shouldn't do
> the check in a loop like a version shows (I guess from more-itertools).
> More obvious is the following, and this has only a small constant speed
> penalty.
> 
> def zip_equal(*its):
> yield from zip(*its)
> if any(_sentinel == next(o, _sentinel) for o in its):
> raise ZipLengthError

Alas, that doesn't work, even with your correction of `any` to 
`not all`.

py> list(zip_equal("abc", "xy"))
[('a', 'x'), ('b', 'y')]


The problem here is that zip consumes the "c" from the first iterator, 
exhausting it, so your check at the end finds that all the iterators are 
exhausted.

Here's the function I used:

def zip_equal(*its):
_sentinel = object
its = tuple(map(iter, its))
yield from zip(*its)
if not all(_sentinel == next(o, _sentinel) for o in its):
raise RuntimeError



> I still like zip_strict() better as a name, but whatever.  And I don't care
> what the exception is, or what the sentinel is called.

The sentinel is a local variable (or at least it ought to be -- there is 
no need to make it a global.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/X6FBTVNQPIURKXFUIT4G4SH4G53YUIWD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread David Mertz

Oops. I don't mean any(), but rather 'not all()'. Or alternatively, !=
instead of ==.

 Same point though.

On Sun, May 3, 2020, 11:13 PM David Mertz  wrote:

> > Here is a comparison of the current zip with more-itertools' zip_equal:
>
>> > So the Python version is about 13 times slower, and 10 million
>> iterations
>> > (quite plausible) adds about 2 seconds.
>>
>> Adds two seconds to *what* though? That's why I care more about
>> benchmarks than micro benchmarks. In real-world code, you are going to
>> be processing the data somehow. Adds two seconds to an hour's processing
>> time? I couldn't care less. Adds two seconds to a second? Now I'm
>> interested.
>>
>
> It seems to me that a Python implementation of zip_equals() shouldn't do
> the check in a loop like a version shows (I guess from more-itertools).
> More obvious is the following, and this has only a small constant speed
> penalty.
>
> def zip_equal(*its):
> yield from zip(*its)
> if any(_sentinel == next(o, _sentinel) for o in its):
> raise ZipLengthError
>
> I still like zip_strict() better as a name, but whatever.  And I don't
> care what the exception is, or what the sentinel is called.
>
> ... and yes, I realize you can quibble about exactly how many of the
> multiple iterators passed in might consume an extra element, which is
> possibly more with this code.  But honestly, if your use case is "unequal
> lengths are an error" then you just simply do not care about that.  *MY*
> use, to the contrary seems like it's more like Steven's.  I.e. I do this
> fairly often: zip(a_few_things, lots_available).  Not necessarily infinite,
> but where I expect "more than enough" of the longer iterator.
>
>
> --
> The dead increasingly dominate and strangle both the living and the
> not-yet born.  Vampiric capital and undead corporate persons abuse
> the lives and control the thoughts of homo faber. Ideas, once born,
> become abortifacients against new conceptions.
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VZVBQQXODIP7CRATG3FLPB4D4F3X3JWU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread Christopher Barker

One small comment on this part of the thread:

Yes, using an API that produces an infinite iterator with a "strict"
version of zip() would be, well, bad.

But it would fail the very first time it was used. I can't see how
encouraging people to use a strict version of zip() would require that
folks not create APIs that return infinite iterators -- some users might
get a failure the first time they try to use it, and then they'd fix their
code. No problem there.

And "shortest" would remain the default, so I doubt it would be common for
folks to use the "strict" version without thinking  -- this is really a
non-issue.

-CHB
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2LQH7IXBPV36AVGZU4ERRR2XLESINZ44/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread David Mertz

> Here is a comparison of the current zip with more-itertools' zip_equal:

> > So the Python version is about 13 times slower, and 10 million iterations
> > (quite plausible) adds about 2 seconds.
>
> Adds two seconds to *what* though? That's why I care more about
> benchmarks than micro benchmarks. In real-world code, you are going to
> be processing the data somehow. Adds two seconds to an hour's processing
> time? I couldn't care less. Adds two seconds to a second? Now I'm
> interested.
>

It seems to me that a Python implementation of zip_equals() shouldn't do
the check in a loop like a version shows (I guess from more-itertools).
More obvious is the following, and this has only a small constant speed
penalty.

def zip_equal(*its):
yield from zip(*its)
if any(_sentinel == next(o, _sentinel) for o in its):
raise ZipLengthError

I still like zip_strict() better as a name, but whatever.  And I don't care
what the exception is, or what the sentinel is called.

... and yes, I realize you can quibble about exactly how many of the
multiple iterators passed in might consume an extra element, which is
possibly more with this code.  But honestly, if your use case is "unequal
lengths are an error" then you just simply do not care about that.  *MY*
use, to the contrary seems like it's more like Steven's.  I.e. I do this
fairly often: zip(a_few_things, lots_available).  Not necessarily infinite,
but where I expect "more than enough" of the longer iterator.


-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TZYXVRGEKZ5WIJD6XOZFLI76NBPAC4BC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread Steven D'Aprano

On Sat, May 02, 2020 at 07:43:44PM +0200, Alex Hall wrote:
> On Sat, May 2, 2020 at 6:09 PM Steven D'Aprano  wrote:
> 
> > On Sat, May 02, 2020 at 04:58:43PM +0200, Alex Hall wrote:
> >
> > > I didn't think carefully about this implementation and thought that there
> > > was only a performance cost in the error case. That's obviously not true
> > -
> > > there's an `if` statement executed in Python for every item in every
> > > iterable.
> >
> > Sorry, this does not demonstrate that the performance cost is
> > significant.
> >
> > This adds one "if" per loop, terminating on (one more than) the shortest
> > input. So O(N) on the length of the input. That's usually considered
> > reasonable, provided the per item cost is low.
> >
> > The test in the "if" is technically O(N) on the number of input
> > iterators, but since that's usually two, and rarely more than a handful,
> > it's close enough to a fixed cost.
> >
> > On my old and slow PC `sentinel in combo` is quite fast:
> >
> 
> `sentinel in combo` is problematic if some values have overridden `__eq__`.
> I referred to this problem in a previous email to you, saying that people
> had copied this buggy implementation from SO and that it still hadn't been
> fixed after being pointed out. The fact that you missed this helps to prove
> my point. Getting this right is hard.

I didn't miss it, I ignored it as YAGNI.

Seriously, if some object defines a weird `__eq__` then half the 
standard library, including builtins, stops working "correctly". See for 
example the behaviour of float NANs in lists.

My care factor for this is negligible, until such time that it is proven 
to be an issue for real objects in real code. Until then, YAGNI.

> Fortunately, more_itertools avoids this bug by not using `in`, which you
> seem to not have noticed even though I copied its implementation in the
> email you're responding to.

Which by my testing on my machine is nearly ten times slower than the 
more obvious use of `in`.

> Without actual measurements, this is a classic example of premature
> > micro-optimization.
> >
> > Let's see some real benchmarks proving that a Python version is
> > too slow in real-life code first.
> >
> 
> Here is a comparison of the current zip with more-itertools' zip_equal:
[...]

> my_timeit("consume(zip_equal(x1, x2))")
> ``` 

Huh, there's that weird link to the CoC again.

> So the Python version is about 13 times slower, and 10 million iterations
> (quite plausible) adds about 2 seconds.

Adds two seconds to *what* though? That's why I care more about 
benchmarks than micro benchmarks. In real-world code, you are going to 
be processing the data somehow. Adds two seconds to an hour's processing 
time? I couldn't care less. Adds two seconds to a second? Now I'm 
interested.

To be clear here, I'm not arguing *against* a C accelerated version. I'm 
arguing against the *necessity* of a C version, based only on micro 
benchmarks. If the PEP is accepted, and this goes into itertools, then 
whether it is implemented in C or Python should be a matter for the 
implementer.

We shouldn't argue that this *must* be a builtin because otherwise it 
will be too slow. That's a bogus argument.

> That's not disastrous, but I think
> it's significant enough that someone working with large amounts of data and
> concerned about performance might choose to risk accidental malformed input.

That's their choice to make, not ours. If they are worried about unequal 
input lengths, they can always truncate the data to make them equal 
*wink*

[Oh no, I have a sudden image in my head of people using zip to truncate 
their data to equal lengths, before passing it on to zip_strict "to be 
sure".]

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/E324GYS3RZDVXW7PHYX7LE4Q6W4TMHQF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread Steven D'Aprano

On Sat, May 02, 2020 at 02:58:57PM +0200, Alex Hall wrote:

> I'm not sure what the point of this long spiel about floats and str.upper
> was.

Half a dozen short paragraphs is a long spiel? Sorry, this isn't 
Twitter, and sorry for not discussing complicated technical debates 
using a maximum of 140 characters :-)

The point is that you made a statement about the "underlying core 
principle" being that programs should fail loudly on error. You asked if 
people disagreed. That depends on what you mean by an error, and it 
depends on whether the cost of failing loudly is better or worse than 
doing something else.

> No one thinks that zip should always be strict. The feature would be
> optional and let people choose conveniently between loud failure and silent
> truncation.

We're not really debating whether or not people should be permitted to 
verify the length of input to zip. They have always been able to do 
this. We're debating whether or not this functionality is important 
enough to be in the stdlib or a builtin, and if so, what API it ought to 
have.

> > But I question whether *enough* people need it *often enough* to make it
> > a builtin, or to put a flag on plain zip.
> 
> Well, let's add some data about people needing it.

Nicely collected. Especially since you have found that my own opinion on 
this seems to have changed over the years. What can I say? Opinions 
change, and my opinion may change again in the future.

> > - Is it complicated to get right? No.
> 
> I would say yes. Look at the SO question for example. The asker wrote a
> long, slow, complicated solution and had to ask if it was good enough.

On its own this is evidence that it should be a recipe in itertools. Not 
every Python programmer who can't put put together a six line solution 
out of the tools in itertools is evidence that it should be built-in.

I'm sure that I've built some pretty hairy and unnecessarily complicated 
and fragile code in the past, and by "past" I include yesterday :-) 
Should my inability to write a function well mean that it needs to be 
added to the builtins? I don't think so.

There's a vocal group of people who want to strip the stdlib down to 
bare essentials (whatever that means!). We don't need to agree with that 
to nevertheless agree to be cautious about adding new things to the 
language.

[...]
> I think a major factor here is laziness. I'm pretty sure that sometimes
> I've wanted this kind of strict check, just for better peace of mind, but
> the thought of one of the solutions above feels like too much effort.

Right, this is a very important point.

"Put it on PyPI" is often just a way to dismiss a proposal. Hardly 
anyone is going to add a third-party dependency from some unknown 
individual with just one function. We're not like the node.js community 
and their one-liner external dependencies :-)

But adding a well-known dependency with dozens of functions, like 
more-itertools, that is a viable solution for many people. That 
pushes the balance towards "just use more-itertools".

On the third hand, Brandt's ast bug pushes the balance slightly 
back towards "put it in the stdlib".

(If these decisions were easy, we wouldn't have long debates on 
Python-Ideas about them.)

> I don't want to add a third party dependency just for this. I don't 
> want to read someone else's solution (e.g. on SO) which doesn't have 
> tests and try to evaluate if it's correct.

Yes, but that's true for every and any function. Does everything need to 
be a builtin, because we are lazy and don't want to use third party 
dependencies or test SO code before using it?

Clearly not. So we have to ask why *this* function is more important 
than the thousands of other functions on SO and in third-party libraries 
that it should be a builtin.

> The problem is that no one really *needs* this check. You *can* do without
> it.

"Nobody needs this, therefore it should be a builtin" is a pretty 
unusual argument.

If this is of marginal usefulness, do we really need to bloat the 
builtins with it? Put it in itertools next to zip_longest. Those who 
need it can import it.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OUV5COWWKQIE7FMAUTGUHQWAUYPACM22/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread Steven D'Aprano

On Sat, May 02, 2020 at 10:26:18PM +0200, Alex Hall wrote:

> > If I know that consumers of my data truncate on the shortest input, then
> > as the producer of data I don't have to care about making them equal. I
> > can say:
> >
> > process_user_ids(usernames, generate_user_ids())
> >
> > and pass an infinite stream of user IDs and know that the function will
> > just truncate on the shortest stream. Yay! Life is good.
> >
> > But now this zip flag comes along, and the author of process_user_ids
> > decides to protect me from myself and "fail loudly", and I will curse
> > them onto the hundredth generation for making my life more difficult.
> >
> 
> My guess is that this kind of situation is rare and unusual. The example
> looks made up, is it based on something real?

Yes, its made up, but yes, it is based on real code. I haven't had to 
generate user IDs yet, but I have often passed in (for example) an 
infinite stream of prime numbers, or some other infinite sequence.

Not necessarily infinite either: it might be finite but huge, such as 
combinations or permutations of something.

At the consumer end, the main one that comes to mind off the top of my 
head (apart from code I wrote myself) is numpy, for example their 
coefficent correlation function:

py> np.corrcoef([1, 2, 3, 4, 5, 6, 7, 8], [7, 6, 5, 4, 3, 2])
Traceback (most recent call last):
  ...
ValueError: array dimensions must agree except for d_0

Now I'm still keeping an open-mind whether this check is justified for 
stats functions. (There have been some requests for XY stats in the 
statistics module, so I may have to make a decision some day.)

But my point is, regardless of whether that check is necessary or not, 
*I still have to check it when I produce the data* or else I get a 
potentially spurious exception that will eat my data. In the general 
case where I have iterators not lists, no recovery is possible. I can't 
catch the error, trim the data, and resubmit. I have to preemptively 
trim the data.

Of course I recognise the right of each developer to choose for 
themselves whether to enforce the rule that input streams are equal. And 
sometimes that will be the right thing to do.

But you are arguing that putting zip_strict as a builtin will encourage 
people to do this, and I am saying that *encouraging people to do this* 
is a point against it, because that will lead to the "things which 
aren't errors should never pass silently" Just In Case it might be an 
error.

If you need to enforce equal length input, then one extra line:

from itertools import zip_strict

is no burden. But if someone is on the fence about checking for equal 
lengths, and importing would be too much trouble so they don't bother, 
but making it builtin is enough for them to tip them over into using 
it "just to be safe", then they probably shouldn't be using it.

> An external API which
> requires you to pass parallel iterables instead of pairs is unusual and
> confusing.

I don't know about that. numpy's correlation coefficient supports both a 
single array of data pairs or a pair of separate X, Y values. So does R.

Spreadsheets likewise put the X, Y data in separate cells, not in cells 
with two data points. If your data is coming from a CSV file, as it 
often does, the most natural way to get it is as two separate columns.

But for APIs that do require a single stream of pairs, then this entire 
discussion is irrelevant, since I, the producer of the data, can choose 
whatever behaviour makes sense for me:

- `zip` to truncate the data on shortest input;
- `zip_longest` to pad it;
- `zip_strict` (whether I write my own or get it from the stdlib)

or anything else I like, since I'm the producer.

So we shouldn't be talking about APIs that require a stream of pairs, 
but only APIs that require separate streams.

[...]
> In most cases it wouldn't ensure the correctness of code, but it could give
> some peace of mind and might help readers. But also in those cases if the
> user decides it's redundant and not worth using:
> 
> - The user had better be confident of their judgement, which will
> inevitably sometimes be wrong.

That's not our problem to solve. It isn't up to us to force people to 
use zip_strict "just in case your judgement that it isn't needed is 
wrong".

I think you and I have a severe disagreement on the relationship between 
the stdlib and the end developers. Between your comment above, and the 
one below, you seem to believe that it is the job of the stdlib to 
protect the developer from themselves.

And yet here you are using Python, where we have the ability to 
monkey-patch the builtins namespace, shadow functions, reach into 
functions and manipulate their data, including their code, rebind any 
name, remove attributes of anything, even change the class of (some) 
instances on the fly. Are you sure you are using the right language?

> - Even if the context of the code makes it obvious that it's redundant,
>

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread Brandt Bucher

Steven D'Aprano wrote:
> I cannot imagine why you were surprised about that. Did you already forget 
> about the experience of dict union operators? :-)

Well, those were dictionary operators, not zip iterator constructors. ;)

> Maybe you should chat with another core developer who *disagrees* with your 
> view about such a flag?

I didn't seek out someone who shared my views, I was asking someone for their 
thoughts on the philosophical/design arguments. It just happened that their 
views seemed largely aligned with my own, at least in that area.

Besides, what do you think I've been doing in these threads for the last two 
weeks? :)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T4ZPUM24YJZTOTDNYVV5FV5UX34CH6I7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-03 Thread Paul Moore

On Sat, 2 May 2020 at 12:19, Steven D'Aprano  wrote:
> - Is there a need for it? Granted.
> - Is it complicated to get right? No.

This one, I'll query. Until someone (probably you) mentioned it, I
didn't think of using zip_longest to do this - probably because I was
locked into thinking about the *shortest* element, and "abort when I
run out".

> - Is performance critical enough that it has to be written in C?
>   Probably not.
> - Is there agreement on the functionality? Somewhat.
> - Could that need be met by your own personal toolbox?
> - or a recipe in itertools?

This, I think, would be a good idea. It would be discoverable (name it
zip_strict), and would offer a standard, robust implementation. If
(and this is a big if) it becomes sufficiently popular, a proposal to
"promote" it from a recipe to an actual member of itertools would make
sense.

> - or by a third-party library?
> - or a function in itertools?
>
> We've heard from people who say that they would like a strict version
> of zip which raises on unequal inputs. How many of them like this enough
> to add a six line function to their code?
>
> My personal opinion is that given that Brandt has found one concrete use
> for this in the stdlib, it is probably justifiable to add it to
> itertools. Whether it even needs a C accelerated version, or just a pure
> Python version, I don't care, but then I'm not doing the work :-)

I'm not sure the evidence warrants an itertools function yet, but
that's largely something for Raymond Hettinger (as the module
maintainer) to take a view on. (I'd hope Raymond would also weigh in
on the PEP as it stands, too, given that it's so closely related to
itertools).

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YKXTCHPFNX57KWE66P2U6MLASG2YYMLN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Alex Hall

On Sat, May 2, 2020 at 5:10 PM Steven D'Aprano  wrote:

> On Sat, May 02, 2020 at 02:58:57PM +0200, Alex Hall wrote:
>
> > Adding a function to itertools will mostly solve that problem, but not
> > entirely. Adding `strict=True` is so easy that people will be encouraged
> to
> > use it often and keep their code safe. That to me is the biggest argument
> > for this feature and for this specific API.
>
> The last thing I want is to encourage people to unnecessarily enforce a
> rule "data streams must be equal just to be safe" when they don't
> actually need to be equal. What you are calling the biggest argument for
> this feature is, for me, a strong argument against it.
>
> If I know that consumers of my data truncate on the shortest input, then
> as the producer of data I don't have to care about making them equal. I
> can say:
>
> process_user_ids(usernames, generate_user_ids())
>
> and pass an infinite stream of user IDs and know that the function will
> just truncate on the shortest stream. Yay! Life is good.
>
> But now this zip flag comes along, and the author of process_user_ids
> decides to protect me from myself and "fail loudly", and I will curse
> them onto the hundredth generation for making my life more difficult.
>

My guess is that this kind of situation is rare and unusual. The example
looks made up, is it based on something real? Do you have any examples
based on reality? I've given examples of functions that check the lengths
of their arguments, so it's conceivable you or someone else could have had
this exact problem. The fact that those checks are there shows people
thought it was a good idea and no one has complained enough to change their
minds. And we have examples of people cursing the lack of a check.

> If I'm the producer and consumer of the data, then I can pick and
> choose between versions, and that's all well and good.
>

FWIW I do think this use case is much more common. An external API which
requires you to pass parallel iterables instead of pairs is unusual and
confusing. For example, the real source of the ast.unparse problem is that
ast.Dict.{keys,values} is weird. Every consumer of it such as compile and
ast.unparse has to check the lengths, a strategy that has failed and will
probably continue to fail. I'm not saying that bad APIs are rare, but that
this kind of API is both bad and rare.

We've argued a lot about what kinds of uses of zip are most common, so I
did a little survey of code that I had written or worked with. The uses of
zip that I found could be roughly categorised as follows:

strict (if it existed) should be False: 3
Lengths need to be equal...
...but that's not checked, although that's probably OK: 11
...but that's not checked, and that's a problem: 3
...so there's an assert len(x) == len(y): 2

Based on that data, adding strict=True:

- in the vast majority of cases would not hurt.
- is significantly helpful more often than strict should be False
- would ensure correctness in currently unsafe code as often as strict
should be False

In most cases it wouldn't ensure the correctness of code, but it could give
some peace of mind and might help readers. But also in those cases if the
user decides it's redundant and not worth using:

- The user had better be confident of their judgement, which will
inevitably sometimes be wrong.
- Even if the context of the code makes it obvious that it's redundant,
that context could change in the future and introduce a silent regression,
and people are likely to not think to add strict=True to the zip call
somewhere down the line from their changes. Adding strict=True is more
future-proof to such changes.

> If I'm the producer of the data, and I want it to be equal in length,
> then I control the data and can make it equal.

But in your example you complain about having to do that. Is it a problem
or not?

> But if I'm only the consumer of the data, I have no business failing
>
"just to be safe". That's an anti-feature that makes life more
> difficult, not less, for the producer of the data, akin to excessive
> runtime type checking (I'm sometimes guilty of that myself) or in other
> languages flagging every class in sight as final "just to be safe".
>
> It is possible to be *too* defensive, and if making the strict version
> of zip a builtin encourages consumers of the data to "be safe", then
> that is a mark against it in my strong opinion.
>

I think you make this situation sound worse than it is. "I will curse them
onto the hundredth generation for making my life more difficult" is pretty
melodramatic. If you get an exception because you tried:

process_user_ids(usernames, generate_user_ids())

then you can pretty easily change it to:

process_user_ids(usernames, (user_id for user_id, username in
zip(generate_user_ids(), usernames)))

or if you can generate user IDs one at a time:

process_user_ids(usernames, (generate_user_id() for _ in usernames))

It's a bit inconvenient, but:

- It's pretty

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Alex Hall

On Sat, May 2, 2020 at 6:09 PM Steven D'Aprano  wrote:

> On Sat, May 02, 2020 at 04:58:43PM +0200, Alex Hall wrote:
>
> > I didn't think carefully about this implementation and thought that there
> > was only a performance cost in the error case. That's obviously not true
> -
> > there's an `if` statement executed in Python for every item in every
> > iterable.
>
> Sorry, this does not demonstrate that the performance cost is
> significant.
>
> This adds one "if" per loop, terminating on (one more than) the shortest
> input. So O(N) on the length of the input. That's usually considered
> reasonable, provided the per item cost is low.
>
> The test in the "if" is technically O(N) on the number of input
> iterators, but since that's usually two, and rarely more than a handful,
> it's close enough to a fixed cost.
>
> On my old and slow PC `sentinel in combo` is quite fast:
>

`sentinel in combo` is problematic if some values have overridden `__eq__`.
I referred to this problem in a previous email to you, saying that people
had copied this buggy implementation from SO and that it still hadn't been
fixed after being pointed out. The fact that you missed this helps to prove
my point. Getting this right is hard.

Fortunately, more_itertools avoids this bug by not using `in`, which you
seem to not have noticed even though I copied its implementation in the
email you're responding to.

Without actual measurements, this is a classic example of premature
> micro-optimization.
>
> Let's see some real benchmarks proving that a Python version is
> too slow in real-life code first.
>

Here is a comparison of the current zip with more-itertools' zip_equal:

```
import timeit
from collections import deque
from itertools import zip_longest

_marker = object()

class UnequalIterablesError(Exception):
pass

def zip_equal(*iterables):
"""``zip`` the input *iterables* together, but throw an
``UnequalIterablesError`` if any of the *iterables* terminate before
the others.
"""
for combo in zip_longest(*iterables, fillvalue=_marker):
for val in combo:
if val is _marker:
raise UnequalIterablesError(
"Iterables have different lengths."
)
yield combo

x1 = list(range(1000))
x2 = list(range(1000, 2000))

def my_timeit(stmt):
print(timeit.repeat(stmt, globals=globals(), number=1, repeat=3))

def consume(iterator):
deque(iterator, maxlen=0)

my_timeit("consume(zip(x1, x2))")
my_timeit("consume(zip_equal(x1, x2))")
``` 

Output:

[0.150328965, 0.146724568, 0.1454314829997]
[2.039809026, 2.060877259, 2.021136164997]

So the Python version is about 13 times slower, and 10 million iterations
(quite plausible) adds about 2 seconds. That's not disastrous, but I think
it's significant enough that someone working with large amounts of data and
concerned about performance might choose to risk accidental malformed input.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FAHQYXQ4S4KMJJFJRTNYHKVH2QWQS3OY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Steven D'Aprano

On Sat, May 02, 2020 at 04:58:43PM +0200, Alex Hall wrote:

> I didn't think carefully about this implementation and thought that there
> was only a performance cost in the error case. That's obviously not true -
> there's an `if` statement executed in Python for every item in every
> iterable. 

Sorry, this does not demonstrate that the performance cost is 
significant.

This adds one "if" per loop, terminating on (one more than) the shortest 
input. So O(N) on the length of the input. That's usually considered 
reasonable, provided the per item cost is low.

The test in the "if" is technically O(N) on the number of input 
iterators, but since that's usually two, and rarely more than a handful, 
it's close enough to a fixed cost.

On my old and slow PC `sentinel in combo` is quite fast:

py> from timeit import Timer
py> t = Timer('sentinel in combo', setup='sentinel=object(); 
combo=tuple(range(10))')
py> t.repeat()  # default is 100 loops
[1.6585235428065062, 1.6372932828962803, 1.6347543047741055, 
1.6457603527233005, 1.6405461430549622]

So that's about 1.6 nanoseconds extra per loop on my PC. 

(For the sake of comparison, unpacking the tuple into separate variables 
costs about 0.6ns on my machine; so does calling len().)

I would expect most people running this on a newer PC to get one tenth 
of that, or even 1/100, but let's assume a machine even slower and older 
than mine, and call it 3ns to be safe.

What are you doing inside the loop with the zipped up items that 3ns is 
a serious performance bottleneck for your application?

> The overhead is O(len(iterables) * len(iterables[0])). Given that
> zip is used a lot and most uses of zip should probably be strict,

That's not a given. I would say that most uses of zip should not be 
strict.

> this is a significant problem.

Without actual measurements, this is a classic example of premature 
micro-optimization.

Let's see some real benchmarks proving that a Python version is 
too slow in real-life code first.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GOEEMLEOJOWQLMXGDWTW277T22FAQTX4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Steven D'Aprano

On Sat, May 02, 2020 at 02:58:57PM +0200, Alex Hall wrote:

> Adding a function to itertools will mostly solve that problem, but not
> entirely. Adding `strict=True` is so easy that people will be encouraged to
> use it often and keep their code safe. That to me is the biggest argument
> for this feature and for this specific API.

The last thing I want is to encourage people to unnecessarily enforce a 
rule "data streams must be equal just to be safe" when they don't 
actually need to be equal. What you are calling the biggest argument for 
this feature is, for me, a strong argument against it.

If I know that consumers of my data truncate on the shortest input, then
as the producer of data I don't have to care about making them equal. I
can say:

process_user_ids(usernames, generate_user_ids())

and pass an infinite stream of user IDs and know that the function will
just truncate on the shortest stream. Yay! Life is good.

But now this zip flag comes along, and the author of process_user_ids
decides to protect me from myself and "fail loudly", and I will curse
them onto the hundredth generation for making my life more difficult.

If I'm the producer and consumer of the data, then I can pick and 
choose between versions, and that's all well and good.

If I'm the producer of the data, and I want it to be equal in length, 
then I control the data and can make it equal. I don't need the 
consumer's help, and I don't need zip to have a flag.

But if I'm only the consumer of the data, I have no business failing 
"just to be safe". That's an anti-feature that makes life more 
difficult, not less, for the producer of the data, akin to excessive 
runtime type checking (I'm sometimes guilty of that myself) or in other 
languages flagging every class in sight as final "just to be safe".

It is possible to be *too* defensive, and if making the strict version 
of zip a builtin encourages consumers of the data to "be safe", then 
that is a mark against it in my strong opinion.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3ZGEB2ASKMMQPNNBZXNNR3I5JNIFO4M7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Alex Hall

On Sat, May 2, 2020 at 2:58 PM Alex Hall  wrote:

> On Sat, May 2, 2020 at 1:19 PM Steven D'Aprano 
> wrote:
>
>> Rolling your own on top of
>>
> zip_longest is easy. It's half a dozen lines. It could be a recipe in
>> itertools, or a function.
>
>
>> It has taken years for it to be added to more-itertools, suggesting that
>> the real world need for this is small.
>>
>> "Not every two line function needs to be a builtin" -- this is six
>> lines, not two, which is in the proposal's favour, but the principle
>> still applies. Before this becomes a builtin, there are a number of
>> hurdles to pass:
>>
>> - Is there a need for it? Granted.
>> - Is it complicated to get right? No.
>
> - Is performance critical enough that it has to be written in C?
>>   Probably not.
>>
>
> No, probably not
>

I take it back, performance is a problem worth considering. Here is the
more-itertools implementation:

https://github.com/more-itertools/more-itertools/blob/master/more_itertools/more.py#L1420

```
def zip_equal(*iterables):
"""``zip`` the input *iterables* together, but throw an
``UnequalIterablesError`` if any of the *iterables* terminate before
the others.
"""
for combo in zip_longest(*iterables, fillvalue=_marker):
for val in combo:
if val is _marker:
raise UnequalIterablesError(
"Iterables have different lengths."
)
yield combo
```

I didn't think carefully about this implementation and thought that there
was only a performance cost in the error case. That's obviously not true -
there's an `if` statement executed in Python for every item in every
iterable. The overhead is O(len(iterables) * len(iterables[0])). Given that
zip is used a lot and most uses of zip should probably be strict, this is a
significant problem. Therefore:

- Rolling your own on top of zip_longest in six lines is not a solution.
- Using more-itertools is not a solution.
- It's complicated to get right.
- Performance is critical enough to do it in C.

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WD4ELGMDEHYIG25L3UAIGCTKKBCI564P/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Steven D'Aprano

On Sat, May 02, 2020 at 04:40:49PM +0200, Alex Hall wrote:
> On Sat, May 2, 2020 at 4:34 PM Steven D'Aprano  wrote:
> 
> > On Sat, May 02, 2020 at 02:58:57PM +0200, Alex Hall wrote:
> >
> >
> > > > Yes? Is it our responsibility to put everything in builtins because
> > > > people might not think to look in math, or functools, or os, or sys?
> > > >
> > >
> > > Putting math.sin or whatever in builtins makes builtins bigger. Adding a
> > > flag to zip does not. 
> >
> > Excuse me, why are you aiming the CoC at me?
> >
> 
> I have no idea how or why that happened. It doesn't show in the sent
> message in my GMail. Something went wrong with quoting.

Okay.

The glitch shows up in web-archive too:

https://mail.python.org/archives/list/python-ideas@python.org/message/C5E6GVAMKWTMYKBDBDQ4D6UEPUGVSANQ/

Very strange.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YTK6FJ64PPT25FMPXJA57JCVQYXGLKPV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Eric V. Smith

On 5/2/2020 10:40 AM, Alex Hall wrote:
On Sat, May 2, 2020 at 4:34 PM Steven D'Aprano > wrote:

On Sat, May 02, 2020 at 02:58:57PM +0200, Alex Hall wrote:

> > Yes? Is it our responsibility to put everything in builtins
because
> > people might not think to look in math, or functools, or os,
or sys?
> >
>
> Putting math.sin or whatever in builtins makes builtins bigger.
Adding a
> flag to zip does not. 

Excuse me, why are you aiming the CoC at me?

I have no idea how or why that happened. It doesn't show in the sent 
message in my GMail. Something went wrong with quoting.

And it doesn't look that way in my inbox, either. In fact, there's a 
whole other sentence before the footer "I think I've missed what harm 
you think it will do to add a flag to zip. Can you point me to your 
objection?".

Eric

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GY7XXTBYLQW2UL72F7R43STFWPL2QP5E/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Alex Hall

On Sat, May 2, 2020 at 4:34 PM Steven D'Aprano  wrote:

> On Sat, May 02, 2020 at 02:58:57PM +0200, Alex Hall wrote:
>
>
> > > Yes? Is it our responsibility to put everything in builtins because
> > > people might not think to look in math, or functools, or os, or sys?
> > >
> >
> > Putting math.sin or whatever in builtins makes builtins bigger. Adding a
> > flag to zip does not. 
>
> Excuse me, why are you aiming the CoC at me?
>

I have no idea how or why that happened. It doesn't show in the sent
message in my GMail. Something went wrong with quoting.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LL25RTRYJIQHLLETFZOECA7RXJOCQFAH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Steven D'Aprano

On Sat, May 02, 2020 at 02:58:57PM +0200, Alex Hall wrote:


> > Yes? Is it our responsibility to put everything in builtins because
> > people might not think to look in math, or functools, or os, or sys?
> >
> 
> Putting math.sin or whatever in builtins makes builtins bigger. Adding a
> flag to zip does not. 

Excuse me, why are you aiming the CoC at me?



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WHBMVA7COPTCB4Y4P5MJC2UGZXVL4H5F/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Alex Hall

On Sat, May 2, 2020 at 1:19 PM Steven D'Aprano  wrote:

> But I question whether *enough* people need it *often enough* to make it
> a builtin, or to put a flag on plain zip.



> It has taken years for it to be added to more-itertools, suggesting that
> the real world need for this is small.
>


> My personal opinion is that given that Brandt has found one concrete use
> for this in the stdlib
>

Many (I'd guess most, but it's hard to measure) uses of zip are concrete
use cases for a strict zip. Most uses of zip are meant for equal lengths,
and unequal lengths are a sign of an error somewhere. People just can't be
bothered at the moment to make their zips strict because it's inconvenient.

I'm guessing Brandt's ast.unparse example is more significant than just a
place where a strict check could be useful. I'm guessing it's an example
where he or someone else actually got bit by the lack of a check. Those are
hard to find. But I'm pretty sure I've experienced it, and we know Ram has.

Something that's a bit easier to find is examples where people have checked
lengths directly. These either show that a strict zip could have been used
directly, or at least that people are concerned about unequal lengths:

https://github.com/more-itertools/more-itertools/blob/master/more_itertools/more.py#L1456
if len(iterables) != len(offsets):
raise ValueError("Number of iterables and offsets didn't match")

staggered = []
for it, n in zip(iterables, offsets):

https://github.com/pypa/setuptools/blob/master/setuptools/dep_util.py#L13
if len(sources_groups) != len(targets):
raise ValueError("'sources_group' and 'targets' must be the same
length")

# build a pair of lists (sources_groups, targets) where source is newer
n_sources = []
n_targets = []
for i in range(len(sources_groups)):
if newer_group(sources_groups[i], targets[i]):

https://github.com/gristlabs/asttokens/blob/master/tests/test_mark_tokens.py#L783
  self.assertEqual(len(t1), len(t2))
  for vc1, vc2 in zip(t1, t2):
self.assert_nodes_equal(vc1, vc2)

File "cpython/Tools/demo/sortvisu.py", line 348
if len(oldpts) != len(newpts):
raise ValueError("can't interpolate arrays of different length")
pts = [0]*len(oldpts)
res = [tuple(oldpts)]
for i in range(1, n):
for k in range(len(pts)):
pts[k] = oldpts[k] + (newpts[k] - oldpts[k])*i//n

File "cpython/Modules/_decimal/libmpdec/literature/fnt.py", line 194
assert(len(a) == len(b))
x = ntt(a, 1)
y = ntt(b, 1)
for i in range(len(a)):
y[i] = y[i] * x[i]

https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/collections.py#L1121
if len(verts) != len(codes):
raise ValueError("'codes' must be a 1D list or array "
 "with the same length of 'verts'")
self._paths = []
for xy, cds in zip(verts, codes):

https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L1451
if len(lineoffsets) != len(positions):
raise ValueError('lineoffsets and positions are unequal sized '
 'sequences')
if len(linelengths) != len(positions):
raise ValueError('linelengths and positions are unequal sized '
 'sequences')
if len(linewidths) != len(positions):
raise ValueError('linewidths and positions are unequal sized '
 'sequences')
if len(colors) != len(positions):
raise ValueError('colors and positions are unequal sized '
 'sequences')
if len(linestyles) != len(positions):
raise ValueError('linestyles and positions are unequal sized '
 'sequences')

colls = []
for position, lineoffset, linelength, linewidth, color, linestyle
in \
zip(positions, lineoffsets, linelengths, linewidths,
colors, linestyles):

https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L3412
for e in [a, b]:
if len(data) != len(e):
raise ValueError(
f"The lengths of the data ({len(data)}) and the "
f"error {len(e)} do not match")
low = [v - e for v, e in zip(data, a)]
high = [v + e for v, e in zip(data, b)]

https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L3783
if (len(np.ravel(usermedians)) != len(bxpstats) or
np.shape(usermedians)[0] != len(bxpstats)):
raise ValueError(
"'usermedians' and 'x' have different lengths")
else:
# reassign medians as necessary
for stats, med in zip(bxpstats, usermedians):

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Ram Rachum

That was a good email Alex. Besides the relevant examples, you've put into
words things that I wanted to say but didn't realize it. Good job :)

On Sat, May 2, 2020 at 4:00 PM Alex Hall  wrote:

> On Sat, May 2, 2020 at 1:19 PM Steven D'Aprano 
> wrote:
>
>> On Sat, May 02, 2020 at 09:54:46AM +0200, Alex Hall wrote:
>>
>> > I would say that in pretty much all cases you wouldn't catch the
>> exception.
>> > It's the producer's responsibility to produce correct inputs, and if
>> they
>> > don't, tell them that they failed in their responsibility.
>> >
>> > The underlying core principle is that programs should fail loudly when
>> > users make mistakes to help them find those mistakes.
>>
>> Maybe. It depends on whether it is a meaningful mistake, and the cost of
>> the loud failure versus the usefulness of silent truncation.
>>
>
> I'm not sure what the point of this long spiel about floats and str.upper
> was. No one thinks that zip should always be strict. The feature would be
> optional and let people choose conveniently between loud failure and silent
> truncation.
>
> So bringing it back to zip... I don't think I ever denied that, in
>> principle at least, somebody might need to raise on mismatched lengths.
>> (If I did give that impression, I apologise.) I did say I never needed
>> it myself, and my own zip_strict function in my personal toolbox remains
>> unused after many years. But somebody needs it? Sure, I'll accept that.
>>
>> But I question whether *enough* people need it *often enough* to make it
>> a builtin, or to put a flag on plain zip.
>
>
> Well, let's add some data about people needing it.
>
> Here is a popular question on the topic:
> https://stackoverflow.com/questions/32954486/zip-iterators-asserting-for-equal-length-in-python
>
> Here are previous threads asking for it:
>
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/UXX3FGOTYHSP4YEA6VDYC37PUNWVJVXY/#UXX3FGOTYHSP4YEA6VDYC37PUNWVJVXY
>
> (In that one you yourself say "Indeed. The need is real, and the question
> has come up many times on
> Python-List as well.")
>
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/OM3ETIDJPXESH76XJK4MPU6ZARMFFHFH/#P6FTBTUNT3MHL2XWNAJFCUEZTQFMGHJW
>
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/K54NG74L6AI4UQ6VKZIBABGJJZQM6G4B/#UCVKQQKDWWADYEZ4Z7IVFPSSDM5XYR2B
>
> Here are similar requests for Rust:
>
> https://internals.rust-lang.org/t/non-truncating-more-usable-zip/5205
>
> https://mail.mozilla.org/pipermail/rust-dev/2013-May/004039.html
> (which mentions that Erlang's zip is strict)
>
> Rolling your own on top of
>> zip_longest is easy. It's half a dozen lines. It could be a recipe in
>> itertools, or a function.
>
>
>> It has taken years for it to be added to more-itertools, suggesting that
>> the real world need for this is small.
>>
>> "Not every two line function needs to be a builtin" -- this is six
>> lines, not two, which is in the proposal's favour, but the principle
>> still applies. Before this becomes a builtin, there are a number of
>> hurdles to pass:
>>
>> - Is there a need for it? Granted.
>> - Is it complicated to get right? No.
>>
>
> I would say yes. Look at the SO question for example. The asker wrote a
> long, slow, complicated solution and had to ask if it was good enough.
> Martjin (who is a prolific answerer) gave two solutions. The top comment
> says that the second solution is very nice. Months later someone pointed
> out that the second solution is actually buggy, so it was edited out. The
> remaining solution still has an issue which is mentioned in a comment but
> is not addressed. So we know that many people (including me, btw) have copy
> pasted this buggy code and it's now sitting in their codebases. Here are
> some examples from github:
>
> https://github.com/search?q=%22if+sentinel+in+combo%22=Code
>
>
>> - Is performance critical enough that it has to be written in C?
>>   Probably not.
>>
>
> No, probably not, but I don't see why this is a hurdle. This can be
> implemented in any way by different implementations of Python, but for
> CPython, I don't see how else this would play out. Performance isn't really
> the reason this should be in the language.
>
>
>> - Is there agreement on the functionality? Somewhat.
>> - Could that need be met by your own personal toolbox?
>> - or a recipe in itertools?
>> - or by a third-party library?
>> - or a function in itertools?
>>
>> We've heard from people who say that they would like a strict version
>> of zip which raises on unequal inputs. How many of them like this enough
>> to add a six line function to their code?
>>
>
> I think a major factor here is laziness. I'm pretty sure that sometimes
> I've wanted this kind of strict check, just for better peace of mind, but
> the thought of one of the solutions above feels like too much effort. I
> don't want to add a third party dependency just for this. I don't want to
> read someone else's solution

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Alex Hall

On Sat, May 2, 2020 at 1:19 PM Steven D'Aprano  wrote:

> On Sat, May 02, 2020 at 09:54:46AM +0200, Alex Hall wrote:
>
> > I would say that in pretty much all cases you wouldn't catch the
> exception.
> > It's the producer's responsibility to produce correct inputs, and if they
> > don't, tell them that they failed in their responsibility.
> >
> > The underlying core principle is that programs should fail loudly when
> > users make mistakes to help them find those mistakes.
>
> Maybe. It depends on whether it is a meaningful mistake, and the cost of
> the loud failure versus the usefulness of silent truncation.
>

I'm not sure what the point of this long spiel about floats and str.upper
was. No one thinks that zip should always be strict. The feature would be
optional and let people choose conveniently between loud failure and silent
truncation.

So bringing it back to zip... I don't think I ever denied that, in
> principle at least, somebody might need to raise on mismatched lengths.
> (If I did give that impression, I apologise.) I did say I never needed
> it myself, and my own zip_strict function in my personal toolbox remains
> unused after many years. But somebody needs it? Sure, I'll accept that.
>
> But I question whether *enough* people need it *often enough* to make it
> a builtin, or to put a flag on plain zip.

Well, let's add some data about people needing it.

Here is a popular question on the topic:
https://stackoverflow.com/questions/32954486/zip-iterators-asserting-for-equal-length-in-python

Here are previous threads asking for it:

https://mail.python.org/archives/list/python-ideas@python.org/thread/UXX3FGOTYHSP4YEA6VDYC37PUNWVJVXY/#UXX3FGOTYHSP4YEA6VDYC37PUNWVJVXY

(In that one you yourself say "Indeed. The need is real, and the question
has come up many times on
Python-List as well.")

https://mail.python.org/archives/list/python-ideas@python.org/thread/OM3ETIDJPXESH76XJK4MPU6ZARMFFHFH/#P6FTBTUNT3MHL2XWNAJFCUEZTQFMGHJW

https://mail.python.org/archives/list/python-ideas@python.org/thread/K54NG74L6AI4UQ6VKZIBABGJJZQM6G4B/#UCVKQQKDWWADYEZ4Z7IVFPSSDM5XYR2B

Here are similar requests for Rust:

https://internals.rust-lang.org/t/non-truncating-more-usable-zip/5205

https://mail.mozilla.org/pipermail/rust-dev/2013-May/004039.html
(which mentions that Erlang's zip is strict)

Rolling your own on top of
> zip_longest is easy. It's half a dozen lines. It could be a recipe in
> itertools, or a function.

> It has taken years for it to be added to more-itertools, suggesting that
> the real world need for this is small.
>
> "Not every two line function needs to be a builtin" -- this is six
> lines, not two, which is in the proposal's favour, but the principle
> still applies. Before this becomes a builtin, there are a number of
> hurdles to pass:
>
> - Is there a need for it? Granted.
> - Is it complicated to get right? No.
>

I would say yes. Look at the SO question for example. The asker wrote a
long, slow, complicated solution and had to ask if it was good enough.
Martjin (who is a prolific answerer) gave two solutions. The top comment
says that the second solution is very nice. Months later someone pointed
out that the second solution is actually buggy, so it was edited out. The
remaining solution still has an issue which is mentioned in a comment but
is not addressed. So we know that many people (including me, btw) have copy
pasted this buggy code and it's now sitting in their codebases. Here are
some examples from github:

https://github.com/search?q=%22if+sentinel+in+combo%22=Code

> - Is performance critical enough that it has to be written in C?
>   Probably not.
>

No, probably not, but I don't see why this is a hurdle. This can be
implemented in any way by different implementations of Python, but for
CPython, I don't see how else this would play out. Performance isn't really
the reason this should be in the language.

> - Is there agreement on the functionality? Somewhat.
> - Could that need be met by your own personal toolbox?
> - or a recipe in itertools?
> - or by a third-party library?
> - or a function in itertools?
>
> We've heard from people who say that they would like a strict version
> of zip which raises on unequal inputs. How many of them like this enough
> to add a six line function to their code?
>

I think a major factor here is laziness. I'm pretty sure that sometimes
I've wanted this kind of strict check, just for better peace of mind, but
the thought of one of the solutions above feels like too much effort. I
don't want to add a third party dependency just for this. I don't want to
read someone else's solution (e.g. on SO) which doesn't have tests and try
to evaluate if it's correct. I certainly don't want to reimplement it
myself. I brush it off thinking "it'll probably be fine", which is bad
behaviour.

The problem is that no one really *needs* this check. You *can* do without
it. The same doesn't apply well to other functions in itertools or
more-itertools. If

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Steven D'Aprano

On Sat, May 02, 2020 at 09:54:46AM +0200, Alex Hall wrote:

> I would say that in pretty much all cases you wouldn't catch the exception.
> It's the producer's responsibility to produce correct inputs, and if they
> don't, tell them that they failed in their responsibility.
> 
> The underlying core principle is that programs should fail loudly when
> users make mistakes to help them find those mistakes.

Maybe. It depends on whether it is a meaningful mistake, and the cost of 
the loud failure versus the usefulness of silent truncation.

py> x = 123456789.01
py> x == 123456789
True

Should that float literal raise or just truncate the value? How about 
arithmetic?

py> 1e50 + 1 == 1e50
True

I guess once in a while it would be useful to know that arithmetic 
was throwing away data, and the IEEE floating point standard allows that 
as an optional trap, but can you imagine how obnoxious it would be to 
have it happen all the time?

Sometimes silently throwing away data is the right thing to do. "Errors 
should never pass silently" depends on what we mean by "error".

Is it an error to call str.upper() on a string that contains no letters? 
Perhaps upper() should raise an exception if it doesn't actually convert 
anything, rather than silently doing nothing. If I'm expecting a string 
of alphabetical letters, but get digits instead, it might be useful for 
upper() to raise.

name.upper(strict=True)

Would I write my own upper() to do this? No. Should it become a builtin? 
Probably not.

So bringing it back to zip... I don't think I ever denied that, in 
principle at least, somebody might need to raise on mismatched lengths. 
(If I did give that impression, I apologise.) I did say I never needed 
it myself, and my own zip_strict function in my personal toolbox remains 
unused after many years. But somebody needs it? Sure, I'll accept that.

But I question whether *enough* people need it *often enough* to make it 
a builtin, or to put a flag on plain zip. Rolling your own on top of 
zip_longest is easy. It's half a dozen lines. It could be a recipe in 
itertools, or a function.

It has taken years for it to be added to more-itertools, suggesting that 
the real world need for this is small.

"Not every two line function needs to be a builtin" -- this is six 
lines, not two, which is in the proposal's favour, but the principle 
still applies. Before this becomes a builtin, there are a number of 
hurdles to pass:

- Is there a need for it? Granted.
- Is it complicated to get right? No.
- Is performance critical enough that it has to be written in C?
  Probably not.
- Is there agreement on the functionality? Somewhat.
- Could that need be met by your own personal toolbox?
- or a recipe in itertools?
- or by a third-party library?
- or a function in itertools?

We've heard from people who say that they would like a strict version 
of zip which raises on unequal inputs. How many of them like this enough 
to add a six line function to their code?

My personal opinion is that given that Brandt has found one concrete use 
for this in the stdlib, it is probably justifiable to add it to 
itertools. Whether it even needs a C accelerated version, or just a pure 
Python version, I don't care, but then I'm not doing the work :-)

> > The most common use for this I have seen in the discussion is:
> >
> > "I have generated two inputs which I expect are equal, and I'd like to
> > be notified if they aren't"
> >
> 
> If there's a different use case I'm not aware of it, can someone share?

Sorry for the confusion, I intended to distinguish between the two 
cases:

1. I have generated two inputs which I expect are equal, and I want to 
   assert that they are equal when I process them.

2. I consume data generated by someone else, and it is *their* 
   responsibility to ensure that they are equal in length.

Sorry that this was not clear.

In the second case it is (in my opinion) perfectly acceptable to put the 
responsibility on the producer of the data, and silently truncate any 
excess data, rather than raise. Just as converting strings to floats 
silently truncates any extra digits. Let the producer check the lengths, 
if they must.

[...]
> The problem is not that they have to look there, it's that they have to
> *think to look there*. itertools might not occur to them. They might not
> even know it exists.

Yes? Is it our responsibility to put everything in builtins because 
people might not think to look in math, or functools, or os, or sys?

> Note that adding a flag is essentially adding to the (empty) namespace that
> is zip's named arguments. Adding a new function is adding to a much larger
> namespace, probably itertools.

I don't agree with that description. A function signature is not a 
namespace.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-02 Thread Alex Hall

On Sat, May 2, 2020 at 3:50 AM Steven D'Aprano  wrote:

> On Thu, Apr 30, 2020 at 07:58:16AM -0700, Christopher Barker wrote:
>
> > Imagine someone that uses zip() in code that works for a while, and then
> > discovers a bug triggered by unequal length inputs.
> >
> > If it’s a flag, they look at the zip docstring, and find the flag, and
> > their problem is solved.
>
> Their problem is not solved. All they have is an exception. Now what are
> they going to do with it?
>

I *think* Christopher was saying they have a logical bug which existed
silently and led to some confusing debugging, and they'd like to be
notified of the unequal lengths in the future. So they want to find the
strict feature (whatever the API may be) which they've either guessed might
exist or vaguely remember seeing before. In that case the zip docstring is
likely the first place they'd look.

If what he meant was that the flag raised an exception, then to answer your
question "what are they going to do with it?", they should either fix the
bug that lead to malformed inputs or remove the flag if they realise
unequal lengths aren't such a problem in this case.

> This is why I am still unconvinced that this functionality is anywhere
> near as useful as the proponents seem to think. Brandt has found one
> good example of a parsing bug in the ast library, but if he has shown
> how this zip_strict function will solve the bug, I haven't seen it.
>

[The bug](https://bugs.python.org/issue40355) is titled "The ast module
fails to reject certain malformed nodes". The function would cause the
nodes to be rejected with an exception.

> In any case, even giving Brandt the benefit of the doubt that this will
> solve the ast bug, its hard for me to generalise from that. If I'm
> expecting equal length inputs, and don't get them, what am I supposed to
> do with the exception as the consumer of the inputs?
>
> As the consumer of the inputs, I can pass the buck to the producer, make
> it their responsibility, and merely promise to truncate the inputs if
> they're not the same length. Otherwise, what do I do once I've caught
> the exception?
>

I would say that in pretty much all cases you wouldn't catch the exception.
It's the producer's responsibility to produce correct inputs, and if they
don't, tell them that they failed in their responsibility.

The underlying core principle is that programs should fail loudly when
users make mistakes to help them find those mistakes. I'm strongly reminded
of when I was advocating for a warning/exception when iterating directly
over a string and some people here didn't understand what the point was. Do
some people not agree with this core principle?

> The most common use for this I have seen in the discussion is:
>
> "I have generated two inputs which I expect are equal, and I'd like to
> be notified if they aren't"
>

If there's a different use case I'm not aware of it, can someone share?

> which to me is an assertion about program correctness. So this ought to
> be an assert that gets disabled under -O, not a raise that the caller
> might catch.
>

That's a pretty decent idea. But are there any other examples in the
standard library of functions behaving differently under -O? I think if you
want that kind of balance between performance and robustness, your best
option is zip(x, y, strict=__debug__). Nice and explicit.

> So this suggests *two* new functions:
>
> - zip_equal for Brandt's parsing bug use-case, guaranteed to raise
>
> - zip_assert_equal for the more common use case of checking
>   program correctness, and disabled under -O
>

Again, I think Brandt's case is still just about checking program
correctness.

> > Is it’s in itertools, they have to think to look there.
>
> And this is a problem, why? Should *everything* be a builtin?
>
> Heaven forbid that somebody has to read the docs and learn about
> modules, let's have one giant global namespace with everything in it!
> Because that's good for the beginners! (Not.)
>

The problem is not that they have to look there, it's that they have to
*think to look there*. itertools might not occur to them. They might not
even know it exists.

Note that adding a flag is essentially adding to the (empty) namespace that
is zip's named arguments. Adding a new function is adding to a much larger
namespace, probably itertools.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T64ICBKTMAJRMUVEGHM2IDW22HIP5RQK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-01 Thread Steven D'Aprano

On Thu, Apr 30, 2020 at 07:58:16AM -0700, Christopher Barker wrote:

> Imagine someone that uses zip() in code that works for a while, and then
> discovers a bug triggered by unequal length inputs.
> 
> If it’s a flag, they look at the zip docstring, and find the flag, and
> their problem is solved.

Their problem is not solved. All they have is an exception. Now what are 
they going to do with it?

This is why I am still unconvinced that this functionality is anywhere 
near as useful as the proponents seem to think. Brandt has found one 
good example of a parsing bug in the ast library, but if he has shown 
how this zip_strict function will solve the bug, I haven't seen it.

In any case, even giving Brandt the benefit of the doubt that this will 
solve the ast bug, its hard for me to generalise from that. If I'm 
expecting equal length inputs, and don't get them, what am I supposed to 
do with the exception as the consumer of the inputs?

As the consumer of the inputs, I can pass the buck to the producer, make 
it their responsibility, and merely promise to truncate the inputs if 
they're not the same length. Otherwise, what do I do once I've caught 
the exception?

The most common use for this I have seen in the discussion is:

"I have generated two inputs which I expect are equal, and I'd like to 
be notified if they aren't"

which to me is an assertion about program correctness. So this ought to 
be an assert that gets disabled under -O, not a raise that the caller 
might catch.

So this suggests *two* new functions:

- zip_equal for Brandt's parsing bug use-case, guaranteed to raise

- zip_assert_equal for the more common use case of checking 
  program correctness, and disabled under -O

> Is it’s in itertools, they have to think to look there.

And this is a problem, why? Should *everything* be a builtin?

Heaven forbid that somebody has to read the docs and learn about 
modules, let's have one giant global namespace with everything in it! 
Because that's good for the beginners! (Not.)

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ODVT4PXERBBFAHBAMJPYF67DFRZULTKS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-01 Thread Steven D'Aprano

On Tue, Apr 28, 2020 at 02:46:35PM -, Brandt Bucher wrote:

> Over the course of the last week, it has become surprisingly clear 
> that this change is controversial enough to require a PEP.

I cannot imagine why you were surprised about that. Did you already 
forget about the experience of dict union operators? :-)

Aside from the Python-Ideas community being quite conservative about 
change at the best of times[1], this is a change with few obvious 
*concrete* use-cases that I have seen, a significant disagreement over 
the intent of the check (is it an assertion that should never be caught, 
or an exception that the caller may want to catch and recover from?), 
plus it has much opportunity for bikeshedding:

- zip_strict or zip_equal or zip_exact or zip_same?
- builtin or itertools?
- function or recipe in itertools?
- zip.method or function?
- just use more-itertools?
- deprecate zip and call it zip_shortest?
- use a True/False flag or a string mode or an enumeration?
- pass a callback function to zip?


Unlike dict union operators, there is no long history of requests for 
this functionality (that I have seen). It wasn't added to itertools when 
zip_longest was added, not even as a recipe. And even more-itertools, 
which adds everything including the kitchen sink to their library, only 
added it within the last few weeks (give or take).

So I don't think you should be surprised by the pushback on this.


> With that in mind, I've started drafting one summarizing the 
> discussion that took place here, and arguing for the addition of a 
> boolean flag to the `zip` constructor. Antoine Pitrou has agreed to 
> sponsor, and I've chatted with another core developer who shares my 
> view that such a flag wouldn't violate Python's existing design 
> philosophies.

Maybe you should chat with another core developer who *disagrees* with 
your view about such a flag?



[1] Whether it is *excessively* conservative probably depends on how you 
feel about the change being proposed :-)


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SI2MF434N5IARBOXAVJ62DGMFNYJFWZ3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 08:08, Christopher Barker  wrote:
> 
> Also please keep in mind that the members of this list, and the python-dev 
> list, are not representative of most Python users. Certainly not beginners 
> but also many (most?) fairly active, but more "casual" users.
> 
> Folks on this list are very invested in the itertools module and iteration in 
> general. But many folks write a LOT of code without every touching 
> iterttools. Honestly, a lot of it is pretty esoteric (zip_longests is not) -- 
> I need to read the docs and think carefully before I know what they even do. 

So what? Most of the os module is pretty esoteric, but that doesn’t stop you—or 
even a novice who just asked “how do I get my files like dir”—from using 
os.listdir. For that matter, zip is in the same place as stuff like setattr and 
memoryview, which are a lot harder to grok than chain.

That novice will never guess to look in os. And if I told them “go look in os”, 
that would be useless and cruel. But I don’t, I tell them “that’s called 
os.listdir”, and they don’t have to learn about effective/real/saved user ids 
or the 11 different spawn functions to “get my files like dir” like they asked.

> Example: Here's the docstring for itertools.chain:
> 
> chain(*iterables) --> chain object
> 
> Return a chain object whose .__next__() method returns elements from the
> first iterable until it is exhausted, then elements from the next
> iterable, until all of the iterables are exhausted.
> 
> I can tell you that I have no idea what that means -- maybe folks wth CS 
> training do, but that is NOT most people that use Python.

And here’s the docstring for zip:

> Return a zip object whose .__next__() method returns a tuple where
> the i-th element comes from the i-th iterable argument.  The .__next__()
> method continues until the shortest iterable in the argument sequence
> is exhausted and then it raises StopIteration

Most people have no idea what that means either.

In fact, chain is simpler to grok than zip (it just doesn’t come up as often, 
so it doesn’t need to be a builtin).

> Anyway, inscrutable docstrings are another issue, and one I keep hoping I'll 
> find the time to try to address one day,

Yes, many of Python’s docstrings tersely explain the details of how the 
function does what it does, rather than telling you why it’s useful or how to 
use it. And yes, that’s less than ideal.

But that isn’t an advantage to adding a flag to zip over adding a new function. 
Making zip more complicated certainly won’t magically fix its docstring, it’ll 
just make the docstring more complicated.

> but the point is :
> 
> "Folks will go look in itertools when zip() doesn't do what they want " just 
> does not apply to most people.

But nobody suggested that they will. That’s exactly why people keep saying it 
should be mentioned in the docstring and the docs page and maybe even the 
tutorial.

And you’re also right that it’s also not true that “folks will read the 
docstring for zip() when zip() doesn’t do what they want and figure it out from 
there”, but that’s equally a problem for both versions of the proposal.

In fact, most people, unless they learned it from a tutorial or class or book 
or blog post or from existing code before they needed it, are going to go to a 
coworker, StackOverflow, the TA for their class, a general web search, etc. to 
find out how to do what they want. There’s only so much Python can do about 
that—the docstring, docs page, and official tutorial (which isn’t the tutorial 
most people learn from) is about it.

We have to trust that if this really is something novices need, the people who 
teach classes and answer on StackOverflow and write tutorials and mentor 
interns and help out C# experts who only use Python twice a year and so on will 
teach it. There’s no way around that. But if those people can and do teach 
os.listdir and math.sin and so on, they can also teach zip_equal.

> Finally, yes, a pointer to itertools in the docstring would help a lot, but 
> yes, it's still a heavier lift than adding a flag, 'cause you have to then go 
> and import a new module, etc.

What’s the “etc.” here? What additional thing do they have to do besides import 
a new module?

People have to import a new module to get a list of their files. And lots of 
other things that are builtins in other languages. In JavaScript, I don’t have 
to import anything to decode JSON, to do basic math functions like sin or mean, 
to create a simple object (where I don’t have to worry about writing __init__ 
and __repr__ and __eq__ and so on), to make a basic web request, etc. In 
Python, I have to import a module to do any of those things (for the last one, 
I even have to install a third-party package first).

Namespaces are a honking great idea, but there is a cost to that idea, and that 
cost includes people having to learn import pretty early on.

___
Python-ideas mailing list --

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-01 Thread Christopher Barker

It's all been said. There is a PEP being written, so we should all make
sure our arguments are well represented there, and let the decision be made.

But in all the discussion about usability and discoverability, etc, please
keep in mind that zip() is a builtin, and zip_longest() and any other
function written will be in the itertools module.

All the tab completion, etc in the world does not help when the functions
are in different namespaces.

Also please keep in mind that the members of this list, and the python-dev
list, are not representative of most Python users. Certainly not beginners
but also many (most?) fairly active, but more "casual" users.

Folks on this list are very invested in the itertools module and iteration
in general. But many folks write a LOT of code without every touching
iterttools. Honestly, a lot of it is pretty esoteric (zip_longests is not)
-- I need to read the docs and think carefully before I know what they even
do.

Example: Here's the docstring for itertools.chain:

chain(*iterables) --> chain object

Return a chain object whose .__next__() method returns elements from the
first iterable until it is exhausted, then elements from the next
iterable, until all of the iterables are exhausted.

I can tell you that I have no idea what that means -- maybe folks wth CS
training do, but that is NOT most people that use Python.

And here's the full docs:

Make an iterator that returns elements from the first iterable until it is
exhausted, then proceeds to the next iterable, until all of the iterables
are exhausted. Used for treating consecutive sequences as a single
sequence. Roughly equivalent to:

def chain(*iterables):
# chain('ABC', 'DEF') --> A B C D E F
for it in iterables:
for element in it:
yield element


OK, that's better, though only because there's a nice simple example there,
and ytou have to go looking for them.

Anyway, inscrutable docstrings are another issue, and one I keep hoping
I'll find the time to try to address one day, but the point is :

"Folks will go look in itertools when zip() doesn't do what they want "
just does not apply to most people.

Finally, yes, a pointer to itertools in the docstring would help a lot, but
yes, it's still a heavier lift than adding a flag, 'cause you have to then
go and import a new module, etc.

-CHB



-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LISCLZXA5RE5WHMWVBPF2CDXN73WYREG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Andrew Barnert via Python-ideas

On Apr 29, 2020, at 22:50, Stephen J. Turnbull 
 wrote:
> Andrew Barnert via Python-ideas writes:
> 
>>> Also -1 on the flag.
> 
> Also -1 on the flag, for the same set of reasons.
> 
> I have to dissent somewhat from one of the complaints, though:
> 
>> auto-complete won’t help at all,

Thanks for pointing this out; I didn’t realize how misleadingly I stated this.

What I meant to say is that auto-complete won’t help at all with the problem 
that flags are less discoverable and harder to type than separate functions. 
Not that it won’t help at all with typing flags—it will actually help a little, 
it’ll just help a lot less than with separate functions, making the problem 
even starker rather than eliminating it.

It’s worth trying this out to see for yourself.

> Many (most?) people use IDEs that will catch up more or less quickly,
> though.  

In fact, most IDEs should just automatically work without needing to change 
anything, because they work off the signatures and/or typesheds in the first 
place. That’s not the issue; the issue is what they can actually do for you. 
And it’s not really any different from in your terminal.

In an iPython REPL in my terminal, I enter these definitions:

def spam(*args, equal=False): pass
def eggs(*args): pass
def eggs_equal(*args): pass

I can now type eggs_equal(x, y) with `e TAB TAB x, y` or `eggs_ TAB x, y`. And 
either way, a pop up is showing me exactly the options I want to see when I ask 
for completion, I’m not just typing that blind.

I can type spam(x, y, equal=True) with `s TAB x, y, e TAB T TAB`. That is 
better than typing out the whole thing, but notice that it requires three 
autocompletes rather than one, and they aren’t nearly as helpful. Why? Well, it 
has no idea that the third argument I want to pass is the equal keyword rather 
than anything at all, because *args takes anything all. And, even after it 
knows I’m passing the equal argument, it has no idea what value I want for it, 
so the only way to get suggestions for what to pass as the value is to type T 
and complete all values in scope starting with T (and usually True will be the 
first one). And it’s not giving me much useful information at each step; I had 
to know that I was looking to type equal=True before it could help me type 
that. The popup signature that shows *args, equal=False does clue me in, but 
still not nearly as well as offering eggs_equal did.

Now repeat the same thing in a source file in PyCharm, and it’s basically the 
same. Sure, the popups are nicer, and PyCharm actually infers that equal is of 
type bool even though I didn’t annotate so it can show me True, False, and all 
bool variables in scope instead of showing me everything in scope, but 
otherwise, no difference. I still need to ask for help three times instead of 
once, and get less guidance when I do.

And that’s with a bool (or Enum) flag. Change it to end="shortest", and it’s 
even worse. Strings aren’t code, they’re data, so PyCharm suggests nothing at 
all for the argument value, while iPython suggests generally-interesting 
strings like the files in my cwd. (I suppose they could add a special case for 
this argument of this function, although they don’t do that for anything else, 
not even the mode argument of open—and, even if they did, at best that makes 
things only a little worse than a bool or Enum instead of a lot worse…)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ERRWSIQC5XQBMOY3WX2NR5HH426LYX5L/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Greg Ewing


On 1/05/20 2:58 am, Christopher Barker wrote:
Imagine someone that uses zip() in code that works for a while, and then 
discovers a bug triggered by unequal length inputs.


If it’s a flag, they look at the zip docstring, and find the flag, and 
their problem is solved.


Why would they look at the docs for zip? The bug wasn't
caused by incorrect use of zip. And using the flag isn't
going to fix it.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XRM2753MALHNQI7O3623BKPLSNQN6YBO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Andrew Barnert via Python-ideas

On Apr 30, 2020, at 07:58, Christopher Barker  wrote:
> 
>> I think that the issue of searchability and signature are pretty
>> compelling reasons for such a simple feature to be part of the
>> function name.
> 
> I would absolutely agree with that if all three function were in the same 
> namespace (like the string methods referred to earlier), but in this case, 
> one is a built in and the others will not be — which makes a huge difference 
> in discoverability.
> 
> Imagine someone that uses zip() in code that works for a while, and then 
> discovers a bug triggered by unequal length inputs.
> 
> If it’s a flag, they look at the zip docstring, and find the flag, and their 
> problem is solved.
> 
> Is it’s in itertools, they have to think to look there. Granted, some 
> googling will probably lead them there, and the zip() docstring can point 
> them there, but it’s still a heavier lift.

I don’t understand. You’re arguing that being discoverable in the docstring is 
sufficient for the flag, but being discoverable in the docstring is a heavier 
lift from the function. Why would this be true, unless you intentionally write 
the docstring badly?

To make this more concrete, let’s say we want to just add on to the existing 
doc string (even though it seems aimed more at reminding experts of the exact 
details than at teaching novices) and stick to the same style. We’re then 
talking about something like this:

> Return a zip object whose .__next__() method returns a tuple where
> the i-th element comes from the i-th iterable argument.  The .__next__()
> method continues until the shortest iterable in the argument sequence
> is exhausted and then it raises StopIteration, or, if equal is true,
> it checks that the remaining iterables are exhausted and otherwise
> raises ValueError. 

… vs. this:

> Return a zip object whose .__next__() method returns a tuple where
> the i-th element comes from the i-th iterable argument.  The .__next__()
> method continues until the shortest iterable in the argument sequence
> is exhausted and then it raises StopIteration. If you need to check
> that all iterables are exhausted, use itertools.zip_equal,
> which raises ValueError if they aren’t.

If they can figure out that equal=True is what they’re looking for from the 
first one, it’ll be just as easy to figure out that zip_equal is what they’re 
looking for from the second.

Of course it might be better to rewrite the whole thing to be more 
novice-friendly and to describe what zip iterates at a higher level instead of 
describing how its __next__ method operates, but that applies to both versions.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BGTNMWVD3THOYV2GILT7LNNYHMBGAW77/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Soni L.




On 2020-04-30 1:07 p.m., Ethan Furman wrote:

On 04/30/2020 07:58 AM, Christopher Barker wrote:

On 04/29/2020 10:51 PM, Stephen J. Turnbull wrote:


I think that the issue of searchability and signature are pretty
compelling reasons for such a simple feature to be part of the
function name.


I would absolutely agree with that if all three function were in the 
same namespace (like the string methods referred to earlier), but in 
this case, one is a built in and the others will not be — which makes 
a huge difference in discoverability.


Imagine someone that uses zip() in code that works for a while, and 
then discovers a bug triggered by unequal length inputs.


If it’s a flag, they look at the zip docstring, and find the flag, 
and their problem is solved.


So update the `zip` docstring with a reference to `zip_longest`, 
`zip_equal`, and `zip_whatever`.


-1 on using a flag.


what about letting `zip` take a `leftover_func` with arguments 
`partial_results` and `remaining_iterators`, and then provide 
`zip_longest`, `zip_equal` and `zip_shortest` (default) as functions you 
can use with it?


an iteration of `zip(a, b, c, leftover_func=foo)` would:

1. call next on the first iterator (internal iter(a))
2. if it fails, call leftover_func with the () tuple as first arg and 
the (internal iter(b), internal iter(c)) tuple as second arg

3. call next on the second iterator (internal iter(b))
4. if it fails, call leftover_func with the (result from a,) tuple as 
the first arg and the (internal iter(a), internal iter(c)) tuple as 
second arg

5. call next on the third iterator (internal iter(c))
6. if it fails, call leftover_func with the (result from a, result from 
b) tuple as the first arg and the (internal iter(a), internal iter(b)) 
tuple as second arg

7. yield the (result from a, result from b, result from c) tuple

the leftover_func should return an iterator that replaces the zip, or 
None. (zip_shortest would be the no_op function)




--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T7OGRPXEU2UHGGL6FW42DIK7ZVHCMDUS/

Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KKHW4ZUMIFPUVPQVWQD7KAHGSGBCCE6H/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Ethan Furman


On 04/30/2020 07:58 AM, Christopher Barker wrote:

On 04/29/2020 10:51 PM, Stephen J. Turnbull wrote:


I think that the issue of searchability and signature are pretty
compelling reasons for such a simple feature to be part of the
function name.


I would absolutely agree with that if all three function were in the same 
namespace (like the string methods referred to earlier), but in this case, one 
is a built in and the others will not be — which makes a huge difference in 
discoverability.

Imagine someone that uses zip() in code that works for a while, and then 
discovers a bug triggered by unequal length inputs.

If it’s a flag, they look at the zip docstring, and find the flag, and their 
problem is solved.


So update the `zip` docstring with a reference to `zip_longest`, `zip_equal`, 
and `zip_whatever`.

-1 on using a flag.

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T7OGRPXEU2UHGGL6FW42DIK7ZVHCMDUS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Christopher Barker

> I think that the issue of searchability and signature are pretty
> compelling reasons for such a simple feature to be part of the
> function name.


I would absolutely agree with that if all three function were in the same
namespace (like the string methods referred to earlier), but in this case,
one is a built in and the others will not be — which makes a huge
difference in discoverability.

Imagine someone that uses zip() in code that works for a while, and then
discovers a bug triggered by unequal length inputs.

If it’s a flag, they look at the zip docstring, and find the flag, and
their problem is solved.

Is it’s in itertools, they have to think to look there. Granted, some
googling will probably lead them there, and the zip() docstring can point
them there, but it’s still a heavier lift.

-CHB





> Steve
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/YJ3PBENBNHXPQIEJVRTTXGQHHTSDY67B/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IZ3HYYEGGUUSCASTB2WMZMYD6QUP2AAU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Stephen J. Turnbull

Andrew Barnert via Python-ideas writes:

 > > Also -1 on the flag.

Also -1 on the flag, for the same set of reasons.

I have to dissent somewhat from one of the complaints, though:

 > auto-complete won’t help at all,

Many (most?) people use IDEs that will catch up more or less quickly,
though.  Such catchup could be automated to some extent by using an
Enum, although folks who would use the flag might prefer the string
API.  You could handle both, but that would add even more complexity
to the function's initialization.

I think that the issue of searchability and signature are pretty
compelling reasons for such a simple feature to be part of the
function name.

Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YJ3PBENBNHXPQIEJVRTTXGQHHTSDY67B/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-29 Thread Andrew Barnert via Python-ideas

On Apr 29, 2020, at 07:08, Barry Scott  wrote:
> 
> 
>> On 28 Apr 2020, at 16:12, Rhodri James  wrote:
>> 
>>> On 28/04/2020 15:46, Brandt Bucher wrote:
>>> Thanks for weighing in, everybody.
>>> Over the course of the last week, it has become surprisingly clear that 
>>> this change is controversial enough to require a PEP.
>>> With that in mind, I've started drafting one summarizing the discussion 
>>> that took place here, and arguing for the addition of a boolean flag to the 
>>> `zip` constructor. Antoine Pitrou has agreed to sponsor, and I've chatted 
>>> with another core developer who shares my view that such a flag wouldn't 
>>> violate Python's existing design philosophies.
>>> I'll be watching this thread, and should have a draft posted to the list 
>>> for feedback this week.
>> 
>> -1 on the flag.  I'd be happy to have a separate zip_strict() (however you 
>> spell it), but behaviour switches just smell wrong.
> 
> Also -1 on the flag.
> 
> 1. A new name can be searched for.
> 2. You do not force a if on the flag for every single call to zip.

Agreed on both Rhodri’s and Barry’s reasons, and more below.

I also prefer the name zip_equal to zip_strict, because what we’re being strict 
about isn’t nearly as obvious as what’s different between shortest vs. equal 
vs. longest, but that’s just a mild preference, not a -1 like the flag.

In addition to the three points above:

Having one common zip variant spelled as a different function and the other as 
a flag seems really bad for learning and remembering the language. And 
zip_longest has a solidly established precedent. And I don’t think you want to 
add multiple bool flags to zip?

Also, just look at these:

zip_strict(xs, ys)
zip(xs, ys, strict=True)

The first one is easier to read because it doesn’t have the extra 5 characters 
to skim over that don’t really add anything to the meaning, and it puts the 
important distinction up front.

It’s also shorter, and a lot easier to type with auto-complete—which isn’t 
nearly as big of a deal, but if this is really meant to be used often it does 
add up.

And it’s obviously more extensible, if it really is at all possible that we 
might want to eventually deprecate shortest or add new end behaviors like 
yielding partial tuples or Soni’s thing of stashing the leftovers somehow (none 
of which I find very convincing, but others apparently do, and picking a design 
that rules them out means explicitly rejecting them).

A string or enum flag instead of a book solves half of those problems (as long 
as “longest” is one of the options), but it makes others even worse. The 
available strings aren’t even discoverable as part of the signature, 
auto-complete won’t help at all, and the result is even longer and even more 
deemphasizes the important thing.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3JKAI25VFIGBO4HPWQ6S22PNKZ6ZOCCT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-29 Thread Christopher Barker

On Tue, Apr 28, 2020 at 7:49 AM Brandt Bucher 
wrote:

> With that in mind, I've started drafting one summarizing the discussion
> that took place here, and arguing for the addition of a boolean flag to the
> `zip` constructor.

Since you've gotten a few -1s, I'll add a +1 -- for reasons posted here
before, a flag is far more likely to actually get used :-) -- but that's
why we need a PEP.

However, I urge you to consider a trinary switch instead:

"shortest" (default) |  "longest" | "equal"

Yes, we already have zip_longest, but if we're adding switching behavior to
zip(), it might as well handle all cases -- that seems a cleaner API to me.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TIJ6JBBX7GMRRVYTE7I7EXNLXMHTOIQY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-29 Thread Barry Scott




> On 28 Apr 2020, at 16:12, Rhodri James  wrote:
> 
> On 28/04/2020 15:46, Brandt Bucher wrote:
>> Thanks for weighing in, everybody.
>> Over the course of the last week, it has become surprisingly clear that this 
>> change is controversial enough to require a PEP.
>> With that in mind, I've started drafting one summarizing the discussion that 
>> took place here, and arguing for the addition of a boolean flag to the `zip` 
>> constructor. Antoine Pitrou has agreed to sponsor, and I've chatted with 
>> another core developer who shares my view that such a flag wouldn't violate 
>> Python's existing design philosophies.
>> I'll be watching this thread, and should have a draft posted to the list for 
>> feedback this week.
> 
> -1 on the flag.  I'd be happy to have a separate zip_strict() (however you 
> spell it), but behaviour switches just smell wrong.


Also -1 on the flag.

1. A new name can be searched for.
2. You do not force a if on the flag for every single call to zip.

Barry


> 
> -- 
> Rhodri James *-* Kynesim Ltd
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/BZUJUTAVOHJEUZ6QEIRZWZHKCRXE6AAS/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XEQIZHKWNKFDWQCBK4FAEGP2TEMDTMMP/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-28 Thread Rhodri James


On 28/04/2020 15:46, Brandt Bucher wrote:

Thanks for weighing in, everybody.

Over the course of the last week, it has become surprisingly clear that this 
change is controversial enough to require a PEP.

With that in mind, I've started drafting one summarizing the discussion that 
took place here, and arguing for the addition of a boolean flag to the `zip` 
constructor. Antoine Pitrou has agreed to sponsor, and I've chatted with 
another core developer who shares my view that such a flag wouldn't violate 
Python's existing design philosophies.

I'll be watching this thread, and should have a draft posted to the list for 
feedback this week.


-1 on the flag.  I'd be happy to have a separate zip_strict() (however 
you spell it), but behaviour switches just smell wrong.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BZUJUTAVOHJEUZ6QEIRZWZHKCRXE6AAS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-28 Thread Brandt Bucher

Thanks for weighing in, everybody.

Over the course of the last week, it has become surprisingly clear that this 
change is controversial enough to require a PEP.

With that in mind, I've started drafting one summarizing the discussion that 
took place here, and arguing for the addition of a boolean flag to the `zip` 
constructor. Antoine Pitrou has agreed to sponsor, and I've chatted with 
another core developer who shares my view that such a flag wouldn't violate 
Python's existing design philosophies.

I'll be watching this thread, and should have a draft posted to the list for 
feedback this week.

Brandt
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MLMZQZMQ3GRQYZVOXXNMXHQNTHODD4CQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-28 Thread Steven D'Aprano

On Mon, Apr 27, 2020 at 09:21:41AM -0700, Andrew Barnert wrote:

> But this doesn’t do what the OP suggested; it’s a completely different 
> proposal. They wanted to write this:
> 
> zipped = zip(xs, ys).skip()
> 
> … and you’re offering this:
> 
> zipped = zip.skip(xs, ys)
> 
> That’s a decent proposal—arguably better than the one being 
> discussed—but it’s definitely not the same one.

So he did. I misread his comment, sorry. Perhaps I read it as I would 
have written it rather than as he wrote it :-(


[...]
> Your design looks like a pretty good one at least at first glance, and 
> I think you should propose it seriously. You should be showing why 
> it’s better than adding methods to zip objects—and also better than 
> adding more functions to itertools or builtins, or flags to zip, or 
> doing nothing—not pretending it’s the same as one of those other 
> proposals and then trying to defend that other proposal by confusing 
> the problems with it.

Last time I got volunteered into writing a PEP I wasn't in favour of, 
and (initially at least) thought I was writing to have the rejection 
reason documented, it ended up getting approved :-)


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YFU2XWVZDNBP3EBGEKE7DYT3DVEUPAQO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread David Mertz

On Mon, Apr 27, 2020 at 4:39 PM Christopher Barker 
wrote:

> Isn't there? There are many cases where you CANNOT (or don't want to, for
> performance reasons) "consume" the entirely of the inolut iterators, and
> many cases where it would be fine to do that. But are there many (any?)
> cases where you couldn't use the "sentinal approach".
>

It depends what you mean by "cannot."  Algorithmically of course you can
use sentinel.  But the issue is computation cost and rollback of
side-effects. E.g. change my example code just slightly:

for p, q, m in zip_longest(bigints1, bigints2, bigints3,
fillvalue=_sentinel):
if _sentinel in pair:
raise UnequalLengthError("uh oh")
result = p**q % m
store_to_db(result)

If we have 1000-digit numbers, but only a couple thousand of them, we would
be vastly better off checking the lengths in advance (if that is possible,
and if generating the numbers in the first place is comparatively cheap).

> Sure: but that is a distinction that is, as far as I know, never made in
> the standard library with all the "iterator related" code. There are some
> things that require proper sequences, but as far as I know, nothing that
> expects a "concretizable" iterator -- and frankly, I'm don't think there is
> a clear definition of that anyway
>

Oh... absolutely.  "Concretizable" is very task-specific.  Other than
infinite iterators, any iterator could be wrapped in list() to *eventually*
get a concrete sequence.  This isn't a Python language distinction but a
"what do you want to do?" distinction.

If there were a zip_equal() in itertools, would you ever write the code to
> use zip_longest and check the sentinel? For my part, I wouldn't, and indeed
> once I had a second need for it, I'd write zip_equal for my own toolbox
> anyway :-)
>

I dunno.  I guess I might use zip_equal() in the case where I didn't want
to bother with an `except` and just let the program crash on mismatch (or
maybe catch with a generic "something went wrong" kind of status).
Whenever I really want specific remediation action, I think I'd still
prefer the sentinel.  Often enough I still can do *something* with the
non-exhausted elements from the other iterators that it feels like a more
general pattern.

FWIW, if it is added, I like the name zip_strict() better than
zip_equal().  But someone else is building the bikeshed, so I'm not that
worried about spelling.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OXSJXBQAMCUUXWYDIGTKBPC4RK5ZWE6J/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread Rhodri James


On 27/04/2020 21:39, Christopher Barker wrote:

To me, having a zip_equal that iterates through the inputs on demand, and
checks when one is exhausted, rather than pre-determining the lengths ahead
of time will solve almost all (or all? I can't think of an example where it
wouldn't) use cases


Except for those cases where either the whole dataset needs to be 
processed or none of it, which is what people were thinking might be 
behind some of the desire for zip_equal().  That you can't do it in the 
general case would be a later "well, bugger" stage of the design :-)


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GWU3KENHAM6BAU3W6ARNM5UZQTX2JZLA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread Christopher Barker

On Sun, Apr 26, 2020 at 9:21 PM David Mertz  wrote:

> On Sun, Apr 26, 2020 at 11:56 PM Christopher Barker 
> wrote:
>
>> > If I have two or more "sequences" there are basically two cases of that.
>>
>> so you need to write different code, depending on which case? that seems
>> not very "there's only one way to do it" to me.
>>
>
> This difference is built into the problem itself.  There CANNOT be only
> one way to do these fundamentally different things.
>

Isn't there? There are many cases where you CANNOT (or don't want to, for
performance reasons) "consume" the entirely of the inolut iterators, and
many cases where it would be fine to do that. But are there many (any?)
cases where you couldn't use the "sentinal approach".

To me, having a zip_equal that iterates through the inputs on demand, and
checks when one is exhausted, rather than pre-determining the lengths ahead
of time will solve almost all (or all? I can't think of an example where it
wouldn't) use cases, and it is completely consistent with all the other
things that are iterators in Py3 that were sequences in py2: zip, map,
dict.items() (and friends), and ...

There is a pretty consistent philosophy in py3 that anything that can be an
iterator, and be lazy-evaluated is done that way, and for the time when you
need an actual sequence, you can wrap list() around it.

So I see no downside to having a zip_equal that doesn't pre-compute the
lengths, when it could.

>
> With iterators, there is at heart a difference between "sequences that one
> can (reasonably) concretize" and "sequences that must be lazy."  And that
> difference means that for some versions of a seemingly similar problem it
> is possible to ask len() before looping through them while for others that
> is not possible (and hence we may have done some work that we want to
> "roll-back" in some sense).
>

Sure: but that is a distinction that is, as far as I know, never made in
the standard library with all the "iterator related" code. There are some
things that require proper sequences, but as far as I know, nothing that
expects a "concretizable" iterator -- and frankly, I'm don't think there is
a clear definition of that anyway -- some things clearly aren't, but others
it would depend on how big they are, and the memory available to the
machine, etc. In fact, the reason we have as many iterator-related tools is
exactly so programmers DON'T have to make that decision.

Can you think of a single case where a zip_equal() (either pre-exisiting or
roll your own) would not work, but the concretizing version would?

There is one "downside" to this in that it potentially leaves the iterators
passed in in a undetermined state -- partially exhausted, and with a longer
one having had one more item removed than was used. But that exists with
"zip_shortest" behavior anyway. But it would be a minor reason to do the
concertizing approach -- at least then you'd know your iterators were fully
exhausted.

SIDE NOTE: this is reminding me that there have been calls in the past for
an optional __len__ protocol for iterators that are not proper sequences,
but DO know their length -- maybe one more place to use that if it existed.

> However, the mismatched length feels like such a small concern in what
can go wrong.

Agreed -- but I think everyone agrees -- this is not a huge deal (or it
would have been done years ago), but it's a nice convenience, and minimally
disruptive.

> Sure.  That's fine. I'm +0 or even higher on adding
itertools.zip_strict().  My taste prefers the other style I showed, but as
I say, this version is perfectly fine.

If there were a zip_equal() in itertools, would you ever write the code to
use zip_longest and check the sentinel? For my part, I wouldn't, and indeed
once I had a second need for it, I'd write zip_equal for my own toolbox
anyway :-)

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QI5A6EDQI72KE4ZW3OKSU6ZLYPBIYYDT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 21:23, David Mertz  wrote:
> 
> 
>> On Sun, Apr 26, 2020 at 11:56 PM Christopher Barker  
>> wrote:
>> > If I have two or more "sequences" there are basically two cases of that.
>> 
>> so you need to write different code, depending on which case? that seems not 
>> very "there's only one way to do it" to me.
> 
> This difference is built into the problem itself.  There CANNOT be only one 
> way to do these fundamentally different things.
> 
> With iterators, there is at heart a difference between "sequences that one 
> can (reasonably) concretize" and "sequences that must be lazy."  And that 
> difference means that for some versions of a seemingly similar problem it is 
> possible to ask len() before looping through them while for others that is 
> not possible (and hence we may have done some work that we want to 
> "roll-back" in some sense).

Agreed. But here’s a different way to look at it:

The Python iteration protocol hides the difference between different kinds of 
iterables; every iterator is just a dumb next-only iterator. So any distinction 
between things you can pre-check and things you can post-check has to be made 
at a higher level, up wherever the code knows what’s being iterated (probably 
the application level). That isn’t inherent to the idea of iteration, as 
demonstrated by C++ (and later languages like Swift), where you can have 
reversible or random-accessible iterators and write tools that switch on those 
features, so you wouldn’t be forced to make the decision at the application 
level. You could write a generic C++ zip_equal function that pre-checks 
random-accessible iterators but post-checks other iterators.

But when would you want that generic function? When you’re writing that 
application code, you know whether you have sequences, inherently lazy 
iterators, or generic iterables as input, and you know whether you want no 
check, a pre-check, or a post-check on equal lengths, and those aren’t 
independent questions: when you want a pre-check, it’s because you’re thinking 
in sequence terms, not general iteration terms.

Pre-checking sequences is so trivial that you don’t need any helpers. The only 
piece Python is (arguably) missing is a way to do that post-check easily when 
you’ve decided you need it, and that’s what the proposals in this thread are 
trying to solve.

The fact that asking for post-checking on the zip iterator won’t look the same 
as manually pre-checking the input sequences isn’t a violation of TOOWTDI 
because the “it” you’re doing is a different thing, different in a way that’s 
meaningful to your code, and there doesn’t have to be one obvious way to do two 
different things. Just like slicing doesn’t have to look the same as islice, 
and a find method doesn’t have to look the same as a generic iterable find 
function, and so on; they only look the same when the distinction between 
thinking about sequences and thinking about lazy iterables is irrelevant to the 
problem.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XXMKTQFT5JJGZS2QNFFT5JUCXLN3GV6J/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 16:58, Steven D'Aprano  wrote:
> 
> On Sun, Apr 26, 2020 at 04:13:27PM -0700, Andrew Barnert via Python-ideas 
> wrote:
> 
>> But if we add methods on zip objects, and then we add a new skip() 
>> method in 3.10, how does the backport work? It can’t monkeypatch the 
>> zip type (unless we both make the type public and specifically design 
>> it to be monkeypatchable, which C builtins usually aren’t).
> 
> Depends on how you define monkey-patching.
> 
> I'm not saying this because I see the need for a plethora of methods on 
> zip (on the contrary); but I do like the methods-on-function API, like 
> itertools.chain has. Functions are namespaces, and we under-utilise 
> that fact in our APIs.
> 
>Namespaces are one honking great idea -- let's do more of those!
> 
> Here is a sketch of how you might do it:
> 
># Untested.
>class MyZipBackport():
>real_zip = builtins.zip
>def __call__(self, *args):
>return self.real_zip(*args)
>def __getattr__(self, name):
># Delegation is another under-utilised technique.
>return getattr(self.real_zip, name)
>def skip(self, *args):
># insert implementation here...
> 
>builtins.zip = MyZipBackport()

But this doesn’t do what the OP suggested; it’s a completely different 
proposal. They wanted to write this:

zipped = zip(xs, ys).skip()

… and you’re offering this:

zipped = zip.skip(xs, ys)

That’s a decent proposal—arguably better than the one being discussed—but it’s 
definitely not the same one.

> I don't know what "zip.skip" is supposed to do,

I quoted it in the email you’re responding to: it’s supposed to yield short 
tuples that skip the iterables that ran out early. But from the wording you 
quoted it should be obvious that isn’t an issue here anyway. As long as you 
understand their point that they want to leave things open for expansion to new 
forms of zipping in the future, you can understand my point that their design 
makes that harder rather than easier.

>> Also, what exactly do these methods return?
> 
> An iterator. What kind of iterator is an implementation detail.
> 
> The type of the zip objects is not part of the public API, only the 
> functional behaviour.

Now go back and do what the OP actually asked for, with the zip iterator type 
having shortest(), equal(), and longest() methods in 3.9 and a skip() method 
added in 3.10. It’s no longer just “some iterator type, doesn’t matter”, it has 
specific methods on it, documented as part of the public API, and you need to 
either subclass it or emulate it. That’s exactly the problem I’m pointing out. 
The fact that it’s not true in 3.8, it’s not required by the problem, it’s not 
true of other designs proposed in this thread like just having more separate 
functions in itertools, it’s specifically a flaw with this design.

So the fact that you can come up with a different design without that flaw 
isn’t an argument against my point, it’s just a probably-unnecessary further 
demonstration of my point.

Your design looks like a pretty good one at least at first glance, and I think 
you should propose it seriously. You should be showing why it’s better than 
adding methods to zip objects—and also better than adding more functions to 
itertools or builtins, or flags to zip, or doing nothing—not pretending it’s 
the same as one of those other proposals and then trying to defend that other 
proposal by confusing the problems with it.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WINTXNJWN7THOKAWTCFK3GZICEFDJJIC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread Ethan Furman


On 04/26/2020 08:56 PM, Christopher Barker wrote:


that seems not very "there's only one way to do it" to me.


The quote is "one obvious way".



It almost feels like the proponents of the new mode/function are hoping to avoid the 
processing that might need to be "rolled back" in some manner if there is a 
synchronization problem.


There is no way to "roll back" an iterator, unless you are writing custom ones 
-- in which case you'll need a custom zip to do the rolling back.

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JZRPXCGRAYIPWGK264YBN7DBIWNIHCZM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Ram Rachum

Here's an idea for combining zip_longest with zip strict. Define zip like
so:

def zip(*iterables, strict=False, fill=object())

With no arguments, it's the regular old zip. With `strict=True`, it ensures
the iterators are equal. If you pass in an argument for `fill`, it becomes
zip_longest.

On Sun, Apr 26, 2020 at 7:37 PM Christopher Barker 
wrote:

> On Sat, Apr 25, 2020 at 10:50 AM Kirill Balunov 
> wrote:
>   ...the mode switching approach in the current situation looks
> reasonable, because the question is how boundary conditions should be
> treated. I still prefer three cases switch like `zip(..., mode=('equal' |
> 'shortest' | 'longest'))`
>
> I like this -- it certainly makes a lot more sense than having zip(),
> zip(...,strict=True), and zip_longest()
>
> So I'm proposing that we have three options on the table:
>
> zip(..., mode=('equal' | 'shortest' | 'longest'))
>
> or
>
> zip()
> zip_longest()
> zip(equal)
>
> or, of course, keep it as it is.
>
>
>
>
>
> ... but also ok with `strict=True` variant.
>
> Chris Angelico wrote:
>
>> Separate functions mean you can easily and simply make a per-module
>> decision:
>>
>> from itertools import zip_strict as zip
>>
>> Tada! Now this module treats zip as strict mode.
>
>
> this is a nifty feature of multiple functions in general, but I'm having a
> really hard time coming up with a use case for these particular functions:
> you're using zip() multiple times in one module, and you want them all to
> be the same "version", but yiou want to be able to easily change that
> version on a module-wide bases?
>
> As for the string methods examples: one big difference is that the string
> methods are all in the same namespace. This is different because zip() is a
> built in, but zip_longest() and zip_equal() would not be. I don't think
> anyone is suggesting adding both of those to builtins. So adding a mode
> switch is the only way to "unify" them -- maybe not a huge deal, but I
> think a big enough deal that zip_equal wouldn't see much use.
>
> >and changing map and friends to iterators is a big part of why you can
> write all kinds of things naturally in Python 3.9 that were clumsy,
> complicated, or even impossible.
>
> Sure, and I think we're all happy about that, but I also find that we've
> lost a bit of the nice "sequence-oriented" behavior too. Not sure that's
> relevant to this discussion though. Bu tit is on one way:Back in 1.5 days,
> you would always use zip() on sequences, so checking their length was
> trivial, if you really wanted to do that -- but while checking that your
> iterators were in fact that same length is possible, it's pretty klunky,
> and certainly not newbie-friendly.
>
> I've lot track of who said it, but I think someone proposed that most
> people really want zip_longest most often. (sorry if I'm misinterpreting
> that). I think this is kinda true, in the sense that if you only had one,
> than zip_longest would be able to conver teh most use-cases (you can build
> a zip_shortest out of zip_longest, but not the other way around) but
> particularly as zip_longest() is going to fail with infinite iterators, it
> can't be the only option after all.
>
> One other comment about the modes vs multiple functions:
>
> It makes a difference with implementation -- with multiple functions, you
> have to re-implement the core functionality three times (DRY) -- or have a
> hidden common function underneath -- that seems like code-smell to me.
>
>
> -CHB
>
>
>
>
>
>
>
>
> --
> Christopher Barker, PhD
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/4EHYHP3XGPU2XJA4AC6PMNVMMWEI5PXD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BQYG3NHQXAQBREIWULJFWNQQXQR6NQZE/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread David Mertz

On Sun, Apr 26, 2020 at 11:56 PM Christopher Barker 
wrote:

> > If I have two or more "sequences" there are basically two cases of that.
>
> so you need to write different code, depending on which case? that seems
> not very "there's only one way to do it" to me.
>

This difference is built into the problem itself.  There CANNOT be only one
way to do these fundamentally different things.

With iterators, there is at heart a difference between "sequences that one
can (reasonably) concretize" and "sequences that must be lazy."  And that
difference means that for some versions of a seemingly similar problem it
is possible to ask len() before looping through them while for others that
is not possible (and hence we may have done some work that we want to
"roll-back" in some sense).

Exactly what not-reasonable means might vary by context.  Infinite
sequences are always a no-go.  But slow iterators could go either way
perhaps.  I.e. I'm waiting on a slow wire for data, but when it arrives it
will be moderate sized.  Should I wait? I dunno, it depends.  Or maybe it
is fast, but there are a billion items. Do I want to use the memory? Maybe
Whatever decision I make has to decide whether bounds can be checked in
advance.

However, the mismatched length feels like such a small concern in what can
go wrong.  For example, I have some time series data I was working with
yesterday.  The same timestamps are meant to match up with several
different measurements.  However, *sometimes* a measurement is missing.  I
might therefore wind up with two or more sequences of the same length, but
with "teeth" of the zipper that don't actually match up always.  Neither
checking len() nor zip_strict() nor a zip_longest() sentinel are going to
catch this problem.

> Or alternately, we have a new function/mode that instead formulates this
> as:
>>
>>
>> try:
>> for pair in zip_strict(stuff1, stuff2):
>> process(pair)
>> except ZipLengthError:
>> raise UnequalLengthError("uh oh")
>>
>> The hypothetical new style is fine.  To me it looks slightly less good,
>> but the difference is minimal.
>>
>
> To me it looks better than both of the other options -- and much better
> (particularly for beginners) than the _sentinal approach.
>

Sure.  That's fine. I'm +0 or even higher on adding
itertools.zip_strict().  My taste prefers the other style I showed, but as
I say, this version is perfectly fine.  De gustibus non disputandum est.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NZUMLDMSDBFCQGHEISEK3H3MUS2LQY26/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Christopher Barker

On Sun, Apr 26, 2020 at 10:52 AM David Mertz  wrote:

> Let me try to explain why I believe that people who think they want
> zip_strict() actually want zip_longest().
>

Thanks for laying it out so clearly. However, reading your post makes it
clear to me that I DO still want zip_strict() :-)

It comes down to this:

> If I have two or more "sequences" there are basically two cases of that.

so you need to write different code, depending on which case? that seems
not very "there's only one way to do it" to me.

Or alternately, we have a new function/mode that instead formulates this as:
>
> try:
> for pair in zip_strict(stuff1, stuff2):
> process(pair)
> except ZipLengthError:
> raise UnequalLengthError("uh oh")
>
> The hypothetical new style is fine.  To me it looks slightly less good,
> but the difference is minimal.
>

To me it looks better than both of the other options -- and much better
(particularly for beginners) than the _sentinal approach.

If folks think that it really won't be used often, fine -- but I'm   that
you think that writing the extra has to be thought out checking code is
actually just as good, or better, API. In fact, if I found myself writing
either of those more than once, I'd write a utility function that did it
(Probably with the second version, as it is reasonable in all cases). And
it I, or others, are writting little utility functions for comon uses,
maybe it DOES make sense to put in in the std library.

It almost feels like the proponents of the new mode/function are hoping to
avoid the processing that might need to be "rolled back" in some manner if
there is a synchronization problem.

Not me for one, I think it's a good idea because it would prevent all of us
from writing those little utilities, and particularly for newbies, would
provide an easy and obvious way to do it.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/V4TKU4QMKQNRKUOI7ZZMNS4N6Q5QAFKA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Chris Angelico

On Mon, Apr 27, 2020 at 9:57 AM Steven D'Aprano  wrote:
> I don't especially want zip_whatever to be slow, but the stdlib has
> no obligation to provide a super-fast highly optimized C accelerated
> version of **everything**. Especially not backports. It is perfectly
> acceptable to say:
>
> "Here's a functionally equivalent version that works in Python 3.old, if
> you want speed then provide your own C version or upgrade to 3.new"
>

True, but if taking the backport causes ALL your zip() objects to
underperform, then that's a cost. It's not just "here's a slower
version that works on 3.old", it's "here's a more functional version
but it slows down other stuff".

Still, performance of backported code is a lower consideration than
getting the API right.

(For the record, I still prefer the separate-functions option, but
what Steven's described is a very reasonable zip-gets-methods option.)

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MAYEVDUCQIYWUVUQPK5EN3NEM2N36JAM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Steven D'Aprano

On Sun, Apr 26, 2020 at 04:13:27PM -0700, Andrew Barnert via Python-ideas wrote:

> But if we add methods on zip objects, and then we add a new skip() 
> method in 3.10, how does the backport work? It can’t monkeypatch the 
> zip type (unless we both make the type public and specifically design 
> it to be monkeypatchable, which C builtins usually aren’t).

Depends on how you define monkey-patching.

I'm not saying this because I see the need for a plethora of methods on 
zip (on the contrary); but I do like the methods-on-function API, like 
itertools.chain has. Functions are namespaces, and we under-utilise 
that fact in our APIs.

Namespaces are one honking great idea -- let's do more of those!

Here is a sketch of how you might do it:

# Untested.
class MyZipBackport():
real_zip = builtins.zip
def __call__(self, *args):
return self.real_zip(*args)
def __getattr__(self, name):
# Delegation is another under-utilised technique.
return getattr(self.real_zip, name)
def skip(self, *args):
# insert implementation here...

builtins.zip = MyZipBackport()

I don't know what "zip.skip" is supposed to do, but I predict that (like 
all the other variants we have discussed) it will end up being a small 
wrapper around zip_longest.

> So 
> more-itertools or zip310 or whatever has to provide a full 
> implementation of the zip type, with all of its methods, and probably 
> twice (in Python for other implementations plus a C accelerator for 
> CPython). Sure, maybe it could delegate to a real zip object for the 
> methods that are already there, but that’s still not trivial (and adds 
> a performance cost).

I dunno, a two-line method (including the `def` signature line) seems 
pretty trivial to me.

Nobody has established that *any* use of zip_whatever is performance 
critical. In what sort of real-world code is the bottleneck going to be 
the performance of zip_whatever *itself* rather than the work done on 
the zipped up tuples?

I don't especially want zip_whatever to be slow, but the stdlib has 
no obligation to provide a super-fast highly optimized C accelerated 
version of **everything**. Especially not backports. It is perfectly 
acceptable to say:

"Here's a functionally equivalent version that works in Python 3.old, if 
you want speed then provide your own C version or upgrade to 3.new"

> Also, what exactly do these methods return?

An iterator. What kind of iterator is an implementation detail.

The type of the zip objects is not part of the public API, only the 
functional behaviour.

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NUBVEOZ3W5Z2YV2A7ZGDQI4FM4ZQQTB2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Steven D'Aprano

On Sun, Apr 26, 2020 at 10:34:27PM +0100, Daniel Moisset wrote:

>- we could add methods to the zip() type that provide different
>behaviours. That way you could use zip(seq, seq2).shortest(), zip(seq1,
>seq2).equal(), zip(seq1, seq2).longer(filler="foo") ; zip(...).shortest()
>would be equivalent to zip(...).  Other names might work better with
>this API, I can think of zip(...).drop_tails(), zip(...).consume_all() and
>zip(...).fill(). This also allows adding other possible behaviours (I
>wouldn't say it's common, but at least once I've wanted to zip lists of
>different length, but get shorter tuples on the tails instead of fillers).

Each of those behaviours can be handled by a simple wrapper function 
around zip_longest.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PXCD3GP3LLAAB3HXMINP6RYTJJA2XBFR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 14:36, Daniel Moisset  wrote:
> 
> This idea is something I could have used many times. I agree with many people 
> here that the strict=True API is at least "unusual" in Python. I was thinking 
> of 2 different API approaches that could be used for this and I think no one 
> has mentioned:
> we could add a callable filler_factory keyword argument to zip_longest. That 
> would allow passing a function that raises an exception if I want "strict" 
> behaviour, and also has some other uses (for example, if I want to use [] as 
> a filler value, but not the *same* empty list for all fillers)
This could be useful, and doesn’t seem too bad.

I still think an itertools.zip_equal would be more discoverable and more easily 
understandable than something like itertools.zip_longest(fill_factory=lambda: 
throw(ValueError)), especially since you have to write that thrower function 
yourself. But if there really are other common uses like 
zip_longest(fill_factory=list), that might make up for it.
> we could add methods to the zip() type that provide different behaviours. 
> That way you could use zip(seq, seq2).shortest(), zip(seq1, seq2).equal(), 
> zip(seq1, seq2).longer(filler="foo") ; zip(...).shortest() would be 
> equivalent to zip(...).  Other names might work better with this API, I can 
> think of zip(...).drop_tails(), zip(...).consume_all() and zip(...).fill(). 
> This also allows adding other possible behaviours (I wouldn't say it's 
> common, but at least once I've wanted to zip lists of different length, but 
> get shorter tuples on the tails instead of fillers).

This second one is a cool idea—but your argument for it seems to be an argument 
against it.

If we stick with separate functions in itertools, and then we add a new one for 
your zip_skip (or whatever you’d call it) in 3.10, the backport is trivial. 
Either more-itertools adds zip_skip, or someone writes an itertools310 library 
with the new functions in 3.10, and then people just do this:

try:
from itertools import zip_skip
except ImportError:
from more_itertools import zip_skip

But if we add methods on zip objects, and then we add a new skip() method in 
3.10, how does the backport work? It can’t monkeypatch the zip type (unless we 
both make the type public and specifically design it to be monkeypatchable, 
which C builtins usually aren’t). So more-itertools or zip310 or whatever has 
to provide a full implementation of the zip type, with all of its methods, and 
probably twice (in Python for other implementations plus a C accelerator for 
CPython). Sure, maybe it could delegate to a real zip object for the methods 
that are already there, but that’s still not trivial (and adds a performance 
cost).

Also, what exactly do these methods return? Do they set some flag and return 
self? If so, that goes against the usual Python rule that mutator methods 
return None rather than self. Plus, it opens the question of what zip(xs, 
ys).equal().shortest() should do. I think you’d want that to be an 
AttributeError, but the only sensible way to get that is if equal() actually 
returns a new object of a new zip_equal type rather than self. So, that solves 
both problems, but it means you have to implement four different builtin types. 
(Also, while the C implementation of those types, and constructing them from 
the zip type’s methods, seems trivial, I think the pure Python version would 
have to be pretty clunky.)___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VF7VRHZPDJXOT3DKYNK3KWUS6HBW3OLX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Daniel Moisset

This idea is something I could have used many times. I agree with many
people here that the strict=True API is at least "unusual" in Python. I was
thinking of 2 different API approaches that could be used for this and I
think no one has mentioned:


   - we could add a callable filler_factory keyword argument to zip_longest.
   That would allow passing a function that raises an exception if I want
   "strict" behaviour, and also has some other uses (for example, if I want to
   use [] as a filler value, but not the *same* empty list for all fillers)
   - we could add methods to the zip() type that provide different
   behaviours. That way you could use zip(seq, seq2).shortest(), zip(seq1,
   seq2).equal(), zip(seq1, seq2).longer(filler="foo") ; zip(...).shortest()
   would be equivalent to zip(...).  Other names might work better with
   this API, I can think of zip(...).drop_tails(), zip(...).consume_all() and
   zip(...).fill(). This also allows adding other possible behaviours (I
   wouldn't say it's common, but at least once I've wanted to zip lists of
   different length, but get shorter tuples on the tails instead of fillers).




On Mon, 20 Apr 2020 at 18:44, Ram Rachum  wrote:

> Here's something that would have saved me some debugging yesterday:
>
> >>> zipped = zip(x, y, z, strict=True)
>
> I suggest that `strict=True` would ensure that all the iterables have been
> exhausted, raising an exception otherwise.
>
> This is useful in cases where you're assuming that the iterables all have
> the same lengths. When your assumption is wrong, you currently just get a
> shorter result, and it could take you a while to figure out why it's
> happening.
>
> What do you think?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6GFUADSQ5JTF7W7OGWF7XF2NH2XUTUQM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G7KMUNO6QQBYFHDPII4TW3LMCLFGWZOY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread David Mertz

Let me try to explain why I believe that people who think they want
zip_strict() actually want zip_longest().  I've already mentioned that I
myself usually want what zip() does not (i.e. zip_shortest()) ... but
indeed not always.

If I have two or more "sequences" there are basically two cases of that.

(1) The sequences are something like "options" where I expect a small
number of them (say 5, or 50).  If that is the case, code like this is
perfectly fine:

stuff1, stuff2 = map(list, (stuff1, stuff2))  # concretize iterators
if len(stuff1) == len(stuff2):
for pair in zip(stuff1, stuff2)):
process(pair)
else:
   raise UnequalLengthErrror("uh oh")

(2) The sequences are either infinite or very large.  I.e. they are "data",
perhaps even streaming data that only arrives over time into the iterator
from some external source.  If this is the case, obviously we cannot
concretize them.  So here we either use the current tool:

for pair in zip_longest(stuff1, stuff2, fillvalue=_sentinel):
if _sentinel in pair:
raise UnequalLengthError("uh oh")
process(pair)

Or alternately, we have a new function/mode that instead formulates this as:

try:
for pair in zip_strict(stuff1, stuff2):
process(pair)
except ZipLengthError:
raise UnequalLengthError("uh oh")

The hypothetical new style is fine.  To me it looks slightly less good, but
the difference is minimal.  It almost feels like the proponents of the new
mode/function are hoping to avoid the processing that might need to be
"rolled back" in some manner if there is a synchronization problem.  But
that simply is not an option.  If we have a billion events, or indefinitely
many events that arrive over time, we simply cannot know before we get to
the end that syncrhonization messed up.  I mean, sure, if some
characteristic of the intermediate data can indicate the mismatch, that's
great... but it's not affected by which style is used, it's a separate test.

Approach (1) is nice where available because it avoids processing
altogether.  But it is only possible for "small data" (and "ready data") no
matter what.

On Sun, Apr 26, 2020 at 12:34 PM Christopher Barker 
wrote:

> On Sat, Apr 25, 2020 at 10:50 AM Kirill Balunov 
> wrote:
>   ...the mode switching approach in the current situation looks
> reasonable, because the question is how boundary conditions should be
> treated. I still prefer three cases switch like `zip(..., mode=('equal' |
> 'shortest' | 'longest'))`
>
> I like this -- it certainly makes a lot more sense than having zip(),
> zip(...,strict=True), and zip_longest()
>
> So I'm proposing that we have three options on the table:
>
> zip(..., mode=('equal' | 'shortest' | 'longest'))
>
> or
>
> zip()
> zip_longest()
> zip(equal)
>
> or, of course, keep it as it is.
>
>
>
>
>
> ... but also ok with `strict=True` variant.
>
> Chris Angelico wrote:
>
>> Separate functions mean you can easily and simply make a per-module
>> decision:
>>
>> from itertools import zip_strict as zip
>>
>> Tada! Now this module treats zip as strict mode.
>
>
> this is a nifty feature of multiple functions in general, but I'm having a
> really hard time coming up with a use case for these particular functions:
> you're using zip() multiple times in one module, and you want them all to
> be the same "version", but yiou want to be able to easily change that
> version on a module-wide bases?
>
> As for the string methods examples: one big difference is that the string
> methods are all in the same namespace. This is different because zip() is a
> built in, but zip_longest() and zip_equal() would not be. I don't think
> anyone is suggesting adding both of those to builtins. So adding a mode
> switch is the only way to "unify" them -- maybe not a huge deal, but I
> think a big enough deal that zip_equal wouldn't see much use.
>
> >and changing map and friends to iterators is a big part of why you can
> write all kinds of things naturally in Python 3.9 that were clumsy,
> complicated, or even impossible.
>
> Sure, and I think we're all happy about that, but I also find that we've
> lost a bit of the nice "sequence-oriented" behavior too. Not sure that's
> relevant to this discussion though. Bu tit is on one way:Back in 1.5 days,
> you would always use zip() on sequences, so checking their length was
> trivial, if you really wanted to do that -- but while checking that your
> iterators were in fact that same length is possible, it's pretty klunky,
> and certainly not newbie-friendly.
>
> I've lot track of who said it, but I think someone proposed that most
> people really want zip_longest most often. (sorry if I'm misinterpreting
> that). I think this is kinda true, in the sense that if you only had one,
> than zip_longest would be able to conver teh most use-cases (you can build
> a zip_shortest out of zip_longest, but not the other way around) but
> particularly as zip_longest() is going to fail with infinite iterators, it
> can't be the only option

1 2 >

1 - 100 of 177 matches

Mail list logo