On Sat, Apr 25, 2020 at 10:41 AM Steven D'Aprano <st...@pearwood.info> wrote:
> On Thu, Apr 23, 2020 at 09:10:16PM -0400, Nathan Schneider wrote: > > > How, for example, to collate lines from 3 potentially large files while > > ensuring they match in length (without an external dependency)? The best > I > > can think of is rather ugly: > > > > with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c: > > for lineA, lineB, lineC in zip(a, b, c): > > do_something_with(lineA, lineB, lineC) > > assert next(a, None) is None > > assert next(b, None) is None > > assert next(c, None) is None > > > > Changing the zip() call to zip(aF, bF, cF, strict=True) would remove the > > necessity of the asserts. > > I think that the "correct" (simplest, easiest, most obvious, most > flexible) way is: > > with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c: > for lineA, lineB, lineC in zip_longest(a, b, c, fillvalue=''): > do_something_with(lineA, lineB, lineC) > > and have `do_something_with` handle the empty string case, either by > raising, or more likely, doing something sensible like treating it as a > blank line rather than dying with an exception. > > This is the sentinel pattern with zip_longest() rather than next(). Sure, it works, but I'm not sure it's the most obvious—conceptually zip_longest() is saying "I want to have as many items as the max of the iterables", but then the loop short-circuits if the fillvalue is used. More natural to say "I expect these iterables to have the same length from the beginning" (if that is what the application demands). > Especially if the files differ in how many newlines they end with. E.g. > file a.txt and c.txt end with a newline, but b.txt ends without one, or > ends with an extra blank line at the end. > > Well, this depends on the application and the assumptions about where the files come from. I can see that zip_longest() will technically work with the sentinel pattern. If there is consensus that it should be a builtin, I might start using this instead of zip() with separate checks. But to enforce length-matching, it still requires an extra check, plus a decision about what the sentinel value should be (for direct file reading '' is fine, but not necessarily for other iterables like collections or file-loading wrappers). IOW, the pattern has some conceptual and code overhead as a solution to "make sure the number of items matches". Given that length-matching is a need that many of us frequently encounter, adding strict=True to zip() seems like a very useful and intuitive option to have, without breaking any existing code. Nathan
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JFEHLFMZ4ENT4DOCELORSA5QRYQ3SSM5/ Code of Conduct: http://python.org/psf/codeofconduct/