Re: [Python-Dev] Add a new locale codec?
Victor Stinner writes: If this is needed, it should be spelled os.getlocaleencoding() (or sys.getlocaleencoding()?) There is already a locale.getpreferredencoding(False) function which give your the current locale encoding. The problem is that the current locale encoding may change and so you have to get the new value each time than you would like to encode or decode data. How can that happen if the programmer (or a module she has imported) isn't messing with the locale? If the programmer is messing with the locale, really they need to be careful. A magic codec whose encoding changes *within* a process is an accident waiting to happen. Do you have a real use case for the 'locale' codec's encoding changes with the locale within process feature? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. Indeed, and that precedent should be enough reason *not* to include a locale encoding. The mbcs encoding has caused much user confusion over the years, and it is less useful than people typically think. For example, for some time, people thought that names in zip files ought to be encoded in mbcs, only to find out that this is incorrect years later. With a locale encoding, the risk for confusion and untestable code is too high (just consider the ongoing saga of the Turkish dotless i (ı)). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
2012/2/10 Martin v. Löwis mar...@v.loewis.de: As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. Indeed, and that precedent should be enough reason *not* to include a locale encoding. The mbcs encoding has caused much user confusion over the years, and it is less useful than people typically think. For example, for some time, people thought that names in zip files ought to be encoded in mbcs, only to find out that this is incorrect years later. With a locale encoding, the risk for confusion and untestable code is too high (just consider the ongoing saga of the Turkish dotless i (ı)). Well, I expected answer and I agree that there are more drawbacks than advantages. I will close the issue as wontfix. The current locale can already be read using locale.getpreferredencoding(False) and I already fixed functions using the current locale encoding. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
I think there's a general expectation that if you encode something with one codec you will be able to decode it with the same codec. That's not necessarily true for the locale encoding. There is the same problem with the filesystem encoding (sys.getfilesystemencoding()), which is the user locale encoding (LC_ALL, LANG or LC_CTYPE) or the Windows ANSI code page. If you wrote a file using this encoding, you may not be able to read it if the filesystem encoding changes between two run, or on another computer. I agree that it is more surprising because the current locale encoding can change anytime, not only between two runs or when you use another computer. Don't you think that this special behaviour can be documented? Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On Thu, 9 Feb 2012 08:43:02 +0200 Simon Cross hodgestar+python...@gmail.com wrote: On Thu, Feb 9, 2012 at 2:35 AM, Steven D'Aprano st...@pearwood.info wrote: Simon Cross wrote: I think I'm -1 on a locale encoding because it refers to different actual encodings depending on where and when it's run, which seems surprising Why is it surprising? Surely that's the whole point of a locale encoding: to use the locale encoding, whatever that happens to be at the time. I think there's a general expectation that if you encode something with one codec you will be able to decode it with the same codec. That's not necessarily true for the locale encoding. As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. (*) or perhaps immoral :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
2012/2/9 Antoine Pitrou solip...@pitrou.net I think there's a general expectation that if you encode something with one codec you will be able to decode it with the same codec. That's not necessarily true for the locale encoding. As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. With the difference that mbcs cannot change during execution. I don't even know if it is possible to change it at all, except by reinstalling Windows. -- Amaury Forgeot d'Arc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
With the difference that mbcs cannot change during execution. It is possible to change the thread ANSI code page (CP_THREAD_ACP) at runtime, but setting the system ANSI code page (CP_ACP) requires to restart Windows. I don't even know if it is possible to change it at all, except by reinstalling Windows. The system ANSI code page can be set in the regional dialog of the control panel. If I remember correctly, it is badly called the language. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
As And pointed out, this is already the behaviour of the mbcs codec under Windows. locale would be the moral (*) equivalent of that under Unix. On Windows, the ANSI code page codec will be accessible using 3 different names: locale, mbcs and the real encoding name (sys.getfilesystemencoding())! Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
Victor Stinner writes: There is the same problem [that encode-decode with the 'locale' codec doesn't roundtrip reliably] with the filesystem encoding (sys.getfilesystemencoding()), -1 on a query to the OS that pretends to be a constant. You see, it's not the same problem. The difference is that 'locale' is a constant and should correspond to a constant encoding, while 'sys.getfilesystemcoding()' is a library function that queries the system, and it's obvious from the syntax that this is expected to change in various circumstances, so if you want roundtripping you need to save the result. Having a nondeterministic locale codec is just begging application (and maybe a few middleware) programmers to use it everywhere they don't feel like thinking about I18N. Experience shows that that is everywhere! If this is needed, it should be spelled os.getlocaleencoding() (or sys.getlocaleencoding()?) Possibly there should be corresponding getlocalelanguage(), getlocaleregion(), and getlocalemodifier() functions, and they should take an optional string argument whose appropriate component is returned. Or maybe there should be a parselocalestring() function that returns a named tuple. Or maybe this three-line function doesn't need to be a builtin? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On Fri, Feb 10, 2012 at 12:59 AM, Stephen J. Turnbull step...@xemacs.org wrote: If this is needed, it should be spelled os.getlocaleencoding() (or sys.getlocaleencoding()?) Or locale.getpreferredencoding(), even ;) FWIW, I agree with Stephen on this one, but take that with the grain of salt that I could probably decode most of the strings I work with as ASCII without breaking anything. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
If this is needed, it should be spelled os.getlocaleencoding() (or sys.getlocaleencoding()?) There is already a locale.getpreferredencoding(False) function which give your the current locale encoding. The problem is that the current locale encoding may change and so you have to get the new value each time than you would like to encode or decode data. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
2012/2/8 Simon Cross hodgestar+python...@gmail.com: Is the idea to have: bfoo.decode(locale) be roughly equivalent to encoding = locale.getpreferredencoding(False) bfoo.decode(encoding) ? Yes. Whereas: bfoo.decode(sys.getfilesystemencoding()) is equivalent to encoding = locale.getpreferredencoding(True) bfoo.decode(encoding) Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
I think I'm -1 on a locale encoding because it refers to different actual encodings depending on where and when it's run, which seems surprising, and there's already a more explicit way to achieve the same effect. The documentation on .getpreferredencoding() says some scary things about needing to call .setlocale() sometimes but doesn't really say when or why. Could any of those cases make locale do weird things because it doesn't call setlocale()? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On 2012-02-08 09:28, Simon Cross wrote: I think I'm -1 on a locale encoding because it refers to different actual encodings depending on where and when it's run, which seems surprising, and there's already a more explicit way to achieve the same effect. I'd agree that this is undesirable, and I don't really want locale-specific behaviour to leak out in other places that accept a encoding name (eg ?xml encoding=locale?), but we already have this behaviour with the mbcs encoding on Windows which refers to the locale-specific 'ANSI' code page. -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/ gtalk:chat?jid=bobi...@doxdesk.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
2012/2/8 Simon Cross hodgestar+python...@gmail.com: I think I'm -1 on a locale encoding because it refers to different actual encodings depending on where and when it's run, which seems surprising, and there's already a more explicit way to achieve the same effect. The following code is just an example to explain how locale is supposed to work, but the implementation is completly different: encoding = locale.getpreferredencoding(False) ... execute some code ... text = bytes.decode(encoding) bytes = text.encode(encoding) The current locale is process-wide: if a thread changes the locale, all threads are affected. Some functions have to use the current locale encoding, and not the locale encoding read at startup. Examples with C functions: strerror(), strftime(), tzname, etc. My codec implementation uses mbstowcs() and wcstombs() which don't touch the current locale, but just use it. Said diffferently, the locale codec would just give access to these functions. The documentation on .getpreferredencoding() says some scary things about needing to call .setlocale() sometimes but doesn't really say when or why. locale.getpreferredencoding() always call setlocale() by default. locale.getpreferredencoding(False) doesn't call setlocale(). setlocale() is not called on Windows or if locale.CODESET is not available (it is available on FreeBSD, Mac OS X, Linux, etc.). Could any of those cases make locale do weird things because it doesn't call setlocale()? Sorry, I don't understand what do you mean by weird things. The locale codec doesn't touch the locale. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On Wed, Feb 8, 2012 at 3:25 PM, Victor Stinner victor.stin...@haypocalc.com wrote: Sorry, I don't understand what do you mean by weird things. The locale codec doesn't touch the locale. Sorry for being unclear. My question was about the following lines from http://docs.python.org/library/locale.html#locale.getpreferredencoding: On some systems, it is necessary to invoke setlocale() to obtain the user preferences, so this function is not thread-safe. If invoking setlocale is not necessary or desired, do_setlocale should be set to False. So my question was about what happens on such systems where invoking setlocale is necessary to obtain the user preferences? Schiavo Simon ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On Wed, Feb 8, 2012 at 3:25 PM, Victor Stinner victor.stin...@haypocalc.com wrote: The current locale is process-wide: if a thread changes the locale, all threads are affected. Some functions have to use the current locale encoding, and not the locale encoding read at startup. Examples with C functions: strerror(), strftime(), tzname, etc. Could a core part of Python breaking because of a sequence like: 1) Encode unicode to bytes using locale codec. 2) Silly third-party library code changes the locale codec. 3) Attempt to decode bytes back to unicode using the locale codec (which is now a different underlying codec). ? Schiavo Simon ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
The current locale is process-wide: if a thread changes the locale, all threads are affected. Some functions have to use the current locale encoding, and not the locale encoding read at startup. Examples with C functions: strerror(), strftime(), tzname, etc. Could a core part of Python breaking because of a sequence like: 1) Encode unicode to bytes using locale codec. 2) Silly third-party library code changes the locale codec. 3) Attempt to decode bytes back to unicode using the locale codec (which is now a different underlying codec). When you decode data from the OS, you have to use the current locale encoding. If you use a variable to store the encoding and the locale is changed, you have to update your variable or you get mojibake. Example with Python 2: lisa$ python2.7 Python 2.7.2+ (default, Oct 4 2011, 20:06:09) import locale encoding=locale.getpreferredencoding(False) encoding 'ANSI_X3.4-1968' encoding, os.strerror(23).decode(encoding) u'Too many open files in system' locale.setlocale(locale.LC_ALL, '') # set the locale 'fr_FR.UTF-8' os.strerror(23).decode(encoding) Traceback (most recent call last): ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 37: ordinal not in range(128) encoding=locale.getpreferredencoding(False) encoding 'UTF-8' os.strerror(23).decode(encoding) u'Trop de fichiers ouverts dans le syst\xe8me' You have to update manually encoding because setlocale() changed LC_MESSAGES locale category (message language) but also LC_CTYPE locale category (encoding). Using the locale encoding, you always get the current locale encoding. In some cases, you must use sys.getfilesystemencoding() (e.g. write into the console or encode/decode filenames), in other cases, you must use the current locale encoding (e.g. sterror() or strftime()). Python 3 does most of the work for me, so you don't have to care of the locale encoding (you just manipulate Unicode, it decodes bytes or encode back to bytes for you). But in some cases, you have to decode or encode manually using the right encoding. In this case, the locale codec can help you. The documentation will have to explain exactly what this new codec is, because as expected, it is confusing :-) Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
Simon Cross wrote: I think I'm -1 on a locale encoding because it refers to different actual encodings depending on where and when it's run, which seems surprising Why is it surprising? Surely that's the whole point of a locale encoding: to use the locale encoding, whatever that happens to be at the time. Perhaps I'm missing something, but I don't see how this proposal is any more surprising than the fact that (say) Decimal uses a global context if you don't specify one explicitly. Only this should be *less* surprising, because Decimal uses the global context by default, while this will use the global locale encoding only if you explicitly tell it to. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
On Thu, Feb 9, 2012 at 2:35 AM, Steven D'Aprano st...@pearwood.info wrote: Simon Cross wrote: I think I'm -1 on a locale encoding because it refers to different actual encodings depending on where and when it's run, which seems surprising Why is it surprising? Surely that's the whole point of a locale encoding: to use the locale encoding, whatever that happens to be at the time. I think there's a general expectation that if you encode something with one codec you will be able to decode it with the same codec. That's not necessarily true for the locale encoding. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Add a new locale codec?
Hi, I added PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and PyUnicode_EncodeLocale() to Python 3.3 to fix bugs. I hesitate to expose this codec in Python: it can be useful is some cases, especially if you need to interact with C functions. The glib library has functions using the *current* locale encoding, g_locale_from_utf8() for example. Related issue with more information: http://bugs.python.org/issue13619 Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new locale codec?
Is the idea to have: bfoo.decode(locale) be roughly equivalent to encoding = locale.getpreferredencoding(False) bfoo.decode(encoding) ? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com