Hi, Nick and all core devs who are interested in this PEP. I'm reviewing PEP 538 and I want to accept it in this month. It will reduces much UnicodeError pains which server-side OPs facing. Thank you Nick for working on this PEP.
If you have something worrying about this PEP, please post a comment soon. If you don't have enough time to read entire this PEP, feel free to ask a question about you're worrying. Here is my comments: > > Relationship with other PEPs > ============================ > > This PEP shares a common problem statement with PEP 540 (improving Python > 3's > behaviour in the default C locale), but diverges markedly in the proposed > solution: > > * PEP 540 proposes to entirely decouple CPython's default text encoding from > the C locale system in that case, allowing text handling inconsistencies to > arise between CPython and other locale-aware components running in the same > process and in subprocesses. This approach aims to make CPython behave less > like a locale-aware application, and more like locale-independent language > runtimes like the JVM, .NET CLR, Go, Node.js, and Rust https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html says: > Every instance of the Java virtual machine has a default charset, which may > or may not be one of the standard charsets. The default charset is determined > during virtual-machine startup and typically depends upon the locale and > charset being used by the underlying operating system. I don't know about .NET runtime on Unix much. (mono and .NET Core). "Go, Node.js and Rust" seems enough examples. > New build-time configuration options > ------------------------------------ > > While both of the above behaviours would be enabled by default, they would > also have new associated configuration options and preprocessor definitions > for the benefit of redistributors that want to override those default > settings. > > The locale coercion behaviour would be controlled by the flag > ``--with[out]-c-locale-coercion``, which would set the > ``PY_COERCE_C_LOCALE`` > preprocessor definition. > > The locale warning behaviour would be controlled by the flag > ``--with[out]-c-locale-warning``, which would set the > ``PY_WARN_ON_C_LOCALE`` > preprocessor definition. "locale warning" means warning printed when C locale is used, am I right? As my understanding, "locale warning" is shown in these cases (all cases implies under C locale and PYTHONUTF8 is not enabled). a. C locale is used and locale coercion is disabled by ``--without-c-locale-coercion`` configure option. b. locale coercion is failed since there is none of C.UTF-8, C.utf8, nor UTF-8 locale. c. Python is embedded. locale coercion can't be used in this case. In case of (b), while warning about C locale is not shown, warning about coercion is still shown. So when people don't want to see warning under C locale and there is no (C.UTF-8, C.utf8, UTF-8) locales, there are three ways: * Set PYTHONUTF=1 (if PEP 540 is accepted) * Set PYTHONCOERCECLOCALE=0. * Use both of ``--without-c-locale-coercion`` and ``--without-c-locale-warning`` configure options. Is my understanding right? BTW, I prefer PEP 540 provides ``--with-utf8mode`` option which enables UTF-8 mode by default. And if it is added, there are too few use cases for ``--without-c-locale-warning``. There are some use cases people want to use UTF-8 by default in system wide. (e.g. container, webserver in Cent OS, etc...) On the other hand, most of C locale usage are "per application" basis, rather than "system wide." configure option is not suitable for such per application setting, off course. But I don't propose removing the option from PEP 538. We can discuss about reducing configure options later. > > On platforms where they would have no effect (e.g. Mac OS X, iOS, Android, > Windows) these preprocessor variables would always be undefined. > Why ``--with[out]-c-locale-coercion`` have no effect on macOS, iOS and Android? On Android, locale coercion fixes readline. Do you mean locale coercion happen always regardless this configuration option? On macOS, ``LC_ALL=C python`` doesn't make Python's stdio to ``ascii:surrogateescape``? Even so, locale coercion may fix libraries like readline, curses. While C locale is less common on macOS, I don't understand any reason to disable it on macOS. I know almost nothing about iOS, but it's similar to Android or macOS in my expectation. > Improving the handling of the C locale > -------------------------------------- > ... > locale settings for locale-aware operations. Both the JVM and the .NET CLR > use UTF-16-LE as their primary encoding for passing text between applications > and the underlying platform. JVM and .NET examples are misleading again. They just use UTF-16-LE for syscall on Windows, like Python. I don't know about them much, but I believe they don't use UTF-16 for system encoding on Linux. > Defaulting to "surrogateescape" error handling on the standard IO streams > ------------------------------------------------------------------------- > By coercing the locale away from the legacy C default and its assumption of > ASCII as the preferred text encoding, this PEP also disables the implicit use > of the "surrogateescape" error handler on the standard IO streams that was > introduced in Python 3.5 ([15_]), as well as the automatic use of > ``surrogateescape`` when operating in PEP 540's UTF-8 mode. > I agree that this PEP shouldn't break byte transparent behavior in C locale by coercing. But I feel behavior difference between coerced C.UTF-8 locale and usual C.UTF-8 locale can be pitfall. I read following part of the section and I agree that there is no way to solve all issue. But how about using surrogateescape handler in C.* locales like C locale? It solves Python 3.7 subprocess under Python 3.7 with coerced C.UTF-8 locale at least. Anyway, I think https://bugs.python.org/issue15216 should be fixed in Python 3.7 too. Python applications which requires byte transparent stdio can use `set_encoding(errors="surrogateescape")` explicitly. Regards, _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com