On Sat, Apr 25, 2020 at 7:43 AM Steven D'Aprano <st...@pearwood.info> wrote:

> I think that the "correct" (simplest, easiest, most obvious, most
> flexible) way is:
>
>     with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
>         for lineA, lineB, lineC in zip_longest(a, b, c, fillvalue=''):
>             do_something_with(lineA, lineB, lineC)
>
> ...

> Especially if the files differ in how many newlines they end with. E.g.
> file a.txt and c.txt end with a newline, but b.txt ends without one, or
> ends with an extra blank line at the end.
>
> File handling code ought to be resilient in the face of such meaningless
> differences,


sure. But what difference is "meaningless" depends on the use case. For
instance, comments or blank lines in the middle of a file may be a
meaningless difference. And you'd want to handle that before zipping
anyway. The way I've solved these types of issues in the past is to filter
the files first, maybe something like:

    with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
        for lineA, lineB, lineC in zip(filtered(a),
                                       filtered(b),
                                       filtered(c), strict=True):
            do_something_with(lineA, lineB, lineC)

> So my argument is that anything you want zip_strict for is better

> handled with zip_longest -- including the case of just raising.
>

That is quite the leap! You make a decent case about handling empty lines
in files, but extending that to "anything" is unwarranted.

I honestly do not understand the resistance here. Yes, any change to the
standard library should be carefully considered, and any change IS a
disruption, and this proposed change may not be worth it. But arguing that
it wouldn't ever be useful, I jsut don't get.

Entirely anecdotal evidence here, but I think this is born out by the
comments in this thread.

* Many people are surprised when they first discover that zip() stops as
the shortest, and silently ignores the rest -- I know I was.
* Many uses (most?) do expect the iterators to be of equal length.
  - The main exception to this may be when one of them is infinite, but how
common is that, really? Remember that when zip was first created (py2) it
was a list builder, not an iterator, and Python itself was much less
iterable-focused.
* However, many uses work fine without any length-checking -- that is often
taken car of elsewhere in the code -- this is kinda-sorta analogous to a
lack of type checking, sure you COULD get errors, but you usually don't.

We've done fine for years with zip's current behavior, but that doesn't
mean it couldn't be a little better and safer for a lot of use cases, and a
number of folks on this thread have said that they would use it.

So: if this were added, it would get some use. How much? hard to know. Is
it critically important? absolute not. But it's fully backward compatible
and not a language change, the barrier to entry is not all that high.

However, I agree with (I think Brandt) in that the lack of a critical need
means that a zip_strict() in itertools would get a LOT less use than a flag
on zip itself -- so I advocate for that. If folks think extending zip() is
not worth it, then I don't think it would be worth bothering with adding a
sip_strict to itertools at all.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2X74JUYM3OF5LGEIWRMS4HTWPTKHX53D/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to