On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benja...@python.org> wrote: > 2013/6/17 Guido van Rossum <gu...@python.org>: >> On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benja...@python.org> >> wrote: >>> 2013/6/17 Greg Ewing <greg.ew...@canterbury.ac.nz>: >>>> Guido van Rossum wrote: >>>>> >>>>> No. Executing a file containing those exact characters produces a >>>>> string containing only '\n' and exec/eval is meant to behave the same >>>>> way. The string may not have originated from a file, so the universal >>>>> newlines behavior of the io module is irrelevant here -- the parser >>>>> must implement its own equivalent processing, and it does. >>>> >>>> >>>> I'm still not convinced that this is necessary or desirable >>>> behaviour. I can understand the parser doing this as a >>>> workaround before we had universal newlines, but now that >>>> we do, I'd expect any Python string to already have newlines >>>> converted to their canonical representation, and that any CRs >>>> it contains are meant to be there. The parser shouldn't need >>>> to do newline translation a second time. >>> >>> It used to be that way until 2.7. People like to do things like >>> >>> with open("myfile.py", "rb") as fp: >>> exec fp.read() in ns >>> >>> which used to fail with CRLF newlines because binary mode doesn't have >>> them. I think this is actually the correct way to execute Python >>> sources because the parser then handles the somewhat complicated >>> process of decoding Python source for you. >> >> What exactly does the parser handles better than the io module? Is it >> just the coding cookies? I suppose that works as long as the file is >> encoded using as ASCII superset like the Latin-N variants or UTF-8. It >> would fail pretty badly if it was UTF-16 (and yes, that's an >> abominable encoding for other reasons :-). > > The coding cookie is the main one. In fact, if you can't parse that, > you don't really know what encoding to open the file with at all. > There's also small things like BOM handling (you have to use the > utf-16-sig encoding with TextIO to get it removed) and defaulting to > UTF-8 (which the io module doesn't do) which is better left to the > parser.
Maybe there are some lessons here that the TextIO module could learn? -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com