Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()
On 19.03.2021 14:57, Inada Naoki wrote: > > Background: PEP 597 adds new `encoding="locale"`option to open() and > TextIOWrapper(). It is same to `encoding=None` for now, but it means using > "locale encoding" explicitly. > > But this is wrong in UTF-8 mode. Please address UTF-8 mode explicitly in open() or elsewhere. The locale module is about the state of the lib C, not what Python enforces via options in its own I/O layers. As mentioned, both should ideally be synchronized, though, so UTF-8 mode in Python should trigger setting a UTF-8 encoding via setlocale(). -- Marc-Andre Lemburg eGenix.com ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()
On 19.03.2021 14:47, STINNER Victor wrote: > > STINNER Victor added the comment: > >> - If you add "current", people will rightly ask: then what do all the >> other APIs in the locale module return ? Of course, they all return >> the current state of settings :-) So this is unnecessary as well. > > The problem is that there are two different "locale encodings", what I call: > > * "current locale encoding": nl_langinfo(CODESET) in short > * "Python locale encoding": "UTF-8" in some cases, nl_langinfo(CODESET) > otherwise The UTF-8 mode is a Python invention. It doesn't have anything to do with the lib C locale functions, which this module addresses and interfaces to. Please don't mix the two. In fact, in order to avoid issues, Python should probably set the locale encoding to UTF-8 as well, when run in UTF-8 mode. It's dangerous to have Python and the lib C use different assumptions about the encoding, esp. in embedded applications. > It is unfortunate that the Python UTF-8 Mode which "ignores the locale" > changes the behavior of the locale module, of the > locale.getpreferredencoding() function. But the ship has sailed. > > People are used to look into the "locale" module to get the "locale" > encoding. So I prefer to put the function to get the "Python locale > encoding" in the locale module. > > I propose to add "current" in the name since this encoding is not the one you > are looking for usually. > > An alternative is to have a single function with an optional parameter. > Example: > > * get_locale_encoding() or get_locale_encoding(True) returns the locale > encoding > * get_locale_encoding(False) returns the current locale encoding -1, both on the names and the idea to again add parameters which change their meaning. We should have one function per meaning and really only need the interface getencoding(), since the UTF-8 mode doesn't fit into the locale module scope. -- Marc-Andre Lemburg eGenix.com ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()
On 19.03.2021 12:35, Eryk Sun wrote: > > Eryk Sun added the comment: > >> Read the ANSI code page on Windows, > > I don't see why the Windows implementation is inconsistent with POSIX here. > If it were changed to be consistent, the default encoding at startup would > remain the same, since setlocale(LC_CTYPE, "") uses the process code page > from GetACP(). I'm not sure I understand what you're saying (but then, I have little experience with locales on Windows). My assumption is that nl_langinfo(CODESET) does not work on Windows or gives wrong results. Is that incorrect ? If it does work, getencoding() could just be a shim over nl_langinfo(CODESET) on all platforms. -- Marc-Andre Lemburg eGenix.com ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()
On 19.03.2021 12:26, STINNER Victor wrote: > > STINNER Victor added the comment: > > Recently, I spent some days to document properly encodings used by Python. Thanks for documenting this. I would prefer to leave the locale module to really just an interface to the lib C locale logic and not add encoding details which are specific to Python's view on I/O (sys or io) or the file system (os). Hopefully, in a few years, we can get rid of all this and standardize on UTF-8 everywhere. -- Marc-Andre Lemburg eGenix.com ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()
On 19.03.2021 12:05, STINNER Victor wrote: > I'm not sure what to do with locale.getdefaultlocale(). Should we deprecate > it? I never used this function. How is it used? For which purpose? > > I undertand that in 2000, locale.getdefaultlocale() was interesting to avoid > calling setlocale(LC_CTYPE, ""). But Python 3 calls setlocale(LC_CTYPE, "") > by default at startup since the early versions, and it's now called on all > platforms since Python 3.8. Moreover, its internal database seems to be > outdated and is painful to maintain (especially if we consider all platforms > supported by Python, not only Linux, there are many issues on macOS). Yes, deprecate it as well. If Python calls setlocale() per default now, it has served its purpose. The alias database is needed by the normalization engine. We may be able to drop the encoding part, but this would have to be checked. -- Marc-Andre Lemburg eGenix.com ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()
On 19.03.2021 11:36, STINNER Victor wrote: > > STINNER Victor added the comment: > >> locale.getencoding() >> >> which interfaces to nl_langinfo(CODESET) or the Windows code >> page and does not try to do any magic, ie. does *not* call >> setlocale(). It needs to return what the lib C currently >> knows and uses as encoding. > > This is locale.get_current_locale_encoding(). I would like to put "current" > in the name, because there is a lot of confusion between > get_current_locale_encoding() encoding and locale.getpreferredencoding(False) > encoding. In locale.getpreferredencoding(False), Python ignores the locale in > some cases which is counter intuitive. These attempts have resulted much of the confusion around the locale module. It's better not to create more of it. - "locale" in the name is unnecessary, since this is the locale module. - If you add "current", people will rightly ask: then what do all the other APIs in the locale module return ? Of course, they all return the current state of settings :-) So this is unnecessary as well. locale.getencoding() works in the same way as locale.getlocale(). It interfaces to the lib C and returns the current encoding setting as known by the lib C. It's just a more intuitive name than locale.nl_langinfo(CODESET) and works on Windows as well. And, again, locale.getpreferredencoding() should be deprecated. The API has been misused in too many ways and is completely broken by now. It was a good idea at the time, when Martin added it, even though I never liked the name. -- Marc-Andre Lemburg eGenix.com ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()
On 19.03.2021 10:17, STINNER Victor wrote: > > New submission from STINNER Victor : > > I propose to add two new functions: > > * locale.get_locale_encoding(): it's exactly the same than > locale.getpreferredencoding(False). > > * locale.get_current_locale_encoding(): always get the current locale > encoding. Read the ANSI code page on Windows, or nl_langinfo(CODESET) on > other platforms. Ignore the UTF-8 Mode. Don't always return "UTF-8" on macOS, > Android, VxWorks. I'm not sure whether this would improve the situation much. The problem is that the locale module is meant to expose the lib C locale settings, but many of the recent additions actually do something completely different: they look into the process and user environment and try to determine external settings, which are not reflected in the lib C locale settings. I had added locale.getdefaultlocale() to give applications a chance to determine the locale setting defined by the process environment *without* calling setlocale(LC_ALL, '') and causing problems in other threads. I used the X11 database for locale encodings, which was the closest you could get to in terms of a standard for encodings at the time (around 2000). Part of the return value is the encoding, which would be set. Martin later added locale.getpreferredencoding(), which tries to determine the encoding in a different way new way, based on nl_langset(CODEINFO). As you mentioned, this intention was broken on several platforms by forcing UTF-8 as output. And in many cases, the API had to call setlocale() as well, causing the thread problems. However, the problem with nl_langset(CODEINFO) is the same as with setlocale(): it returns the current state of the lib C settings, which may well point to the 'C' locale. Not the ones the user has configured in the OS environment. So while you get an encoding defined by lib C for the current locale settings (without guessing it as with locale.getdefaultlocale()), you still don't get what the user really wants to use. Unfortunately, lib C does not provide a way to query the locale database without changing the locale settings at the same time. This is the main issue we're facing. Now, the correct way in all this would be to just call setlocale(LC_ALL, '') at the start of the application and not try to apply all the magic to get around this. But this has to be done by the application and not Python (which may well be embedded into some other application). I'd suggest to add a single new API: locale.getencoding() which interfaces to nl_langinfo(CODESET) or the Windows code page and does not try to do any magic, ie. does *not* call setlocale(). It needs to return what the lib C currently knows and uses as encoding. locale.getpreferredencoding() should then be deprecated. It does not make sense to pretend to query information which is not really directly available from the lib C locale system. And the documentation should point out that applications should call setlocale(LC_ALL, '') when they start up, if they want to get the lib C locale, and thus Python locale module, setup to work based on what the user really wants -- instead of just guessing at this. PS: The locale module normally does not use underscores in function names, so it's not a good idea to add more. -- Marc-Andre Lemburg eGenix.com ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com