On Wed, 6 Dec 2017 at 06:10 INADA Naoki <songofaca...@gmail.com> wrote:
> >> And I have one worrying point. > >> With UTF-8 mode, open()'s default encoding/error handler is > >> UTF-8/surrogateescape. > > > > The Strict UTF-8 Mode is for you if you prioritize correctness over > usability. > > Yes, but as I said, I cares about not experienced developer > who doesn't know what UTF-8 mode is. > > > > > In the very first version of my PEP/idea, I wanted to use > > UTF-8/strict. But then I started to play with the implementation and I > > got many "practical" issues. Using UTF-8/strict, you quickly get > > encoding errors. For example, you become unable to read undecodable > > bytes from stdin. stdin.read() only gives you an error, without > > letting you decide how to handle these "invalid" data. Same issue with > > stdout. > > > > I don't care about stdio, because PEP 538 uses surrogateescape for > stdio/error > > https://www.python.org/dev/peps/pep-0538/#changes-to-the-default-error-handling-on-the-standard-streams > > I care only about builtin open()'s behavior. > PEP 538 doesn't change default error handler of open(). > > I think PEP 538 and PEP 540 should behave almost identical except > changing locale > or not. So I need very strong reason if PEP 540 changes default error > handler of open(). > I don't have enough locale experience to weigh in as an expert, but I already was leaning towards INADA-san's logic of not wanting to change open() and this makes me really not want to change it. -Brett > > > > In the old long version of the PEP, I tried to explain UTF-8/strict > > issues with very concrete examples, the removed "Use Cases" section: > > > https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L490 > > > > Tell me if I should rephrase the rationale of the PEP 540 to better > > justify the usage of surrogateescape. > > OK, "List a directory into a text file" example demonstrates why > surrogateescape > is used for open(). If os.listdir() returns surrogateescpaed data, > file.write() will be > fail. > All other examples are about stdio. > > But we should achieve good balance between correctness and usability of > default behavior. > > > > > Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with > > surrogateescape, or backslashreplace for stderr, or surrogatepass for > > fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But > > the PEP title would be too long, no? :-) > > > > I feel short name is enough. > > > > >> And opening binary file without "b" option is very common mistake of new > >> developers. If default error handler is surrogateescape, they lose a > chance > >> to notice their bug. > > > > When open() in used in text mode to read "binary data", usually the > > developer would only notify when getting the POSIX locale (ASCII > > encoding). But the PEP 538 already changed that by using the C.UTF-8 > > locale (and so the UTF-8 encoding, instead of the ASCII encoding). > > > > With PEP 538 (C.UTF-8 locale), open() uses UTF-8/strict, not > UTF-8/surrogateescape. > > For example, this code raise UnicodeDecodeError with PEP 538 if the > file is JPEG file. > > with open(fn) as f: > f.read() > > > > I'm not sure that locales are the best way to detect such class of > > bytes. I suggest to use -b or -bb option to detect such bugs without > > having to care of the locale. > > > > But many new developers doesn't use/know -b or -bb option. > > > > >> On the other hand, it helps some use cases when user want > byte-transparent > >> behavior, without modifying code to use "surrogateescape" explicitly. > >> > >> Which is more important scenario? Anyone has opinion about it? > >> Are there any rationals and use cases I missing? > > > > Usually users expect that Python 3 "just works" and don't bother them > > with the locale (thay nobody understands). > > > > The old version of the PEP contains a long list of issues: > > > https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L924-L986 > > > > I already replaced the strict error handler with surrogateescape for > > sys.stdin and sys.stdout on the POSIX locale in Python 3.5: > > https://bugs.python.org/issue19977 > > > > For the rationale, read for example these comments: > > > [snip] > > OK, I'll read them and think again about open()'s default behavior. > But I still hope open()'s behavior is consistent with PEP 538 and PEP 540. > > Regards, > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com