On Sat, Apr 25, 2020 at 10:41 AM Steven D'Aprano <st...@pearwood.info>
wrote:

> On Thu, Apr 23, 2020 at 09:10:16PM -0400, Nathan Schneider wrote:
>
> > How, for example, to collate lines from 3 potentially large files while
> > ensuring they match in length (without an external dependency)? The best
> I
> > can think of is rather ugly:
> >
> > with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
> >     for lineA, lineB, lineC in zip(a, b, c):
> >         do_something_with(lineA, lineB, lineC)
> >     assert next(a, None) is None
> >     assert next(b, None) is None
> >     assert next(c, None) is None
> >
> > Changing the zip() call to zip(aF, bF, cF, strict=True) would remove the
> > necessity of the asserts.
>
> I think that the "correct" (simplest, easiest, most obvious, most
> flexible) way is:
>
>     with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c:
>         for lineA, lineB, lineC in zip_longest(a, b, c, fillvalue=''):
>             do_something_with(lineA, lineB, lineC)
>
> and have `do_something_with` handle the empty string case, either by
> raising, or more likely, doing something sensible like treating it as a
> blank line rather than dying with an exception.
>
>
This is the sentinel pattern with zip_longest() rather than next(). Sure,
it works, but I'm not sure it's the most obvious—conceptually zip_longest()
is saying "I want to have as many items as the max of the iterables", but
then the loop short-circuits if the fillvalue is used. More natural to say
"I expect these iterables to have the same length from the beginning" (if
that is what the application demands).


> Especially if the files differ in how many newlines they end with. E.g.
> file a.txt and c.txt end with a newline, but b.txt ends without one, or
> ends with an extra blank line at the end.
>
>
Well, this depends on the application and the assumptions about where the
files come from.

I can see that zip_longest() will technically work with the sentinel
pattern. If there is consensus that it should be a builtin, I might start
using this instead of zip() with separate checks. But to enforce
length-matching, it still requires an extra check, plus a decision about
what the sentinel value should be (for direct file reading '' is fine, but
not necessarily for other iterables like collections or file-loading
wrappers). IOW, the pattern has some conceptual and code overhead as a
solution to "make sure the number of items matches".

Given that length-matching is a need that many of us frequently encounter,
adding strict=True to zip() seems like a very useful and intuitive option
to have, without breaking any existing code.

Nathan
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JFEHLFMZ4ENT4DOCELORSA5QRYQ3SSM5/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to