Re: [Python-Dev] Add a new locale codec?

2012-02-10 Thread Stephen J. Turnbull
Victor Stinner writes:

   If this is needed, it should be spelled os.getlocaleencoding() (or
   sys.getlocaleencoding()?)
  
  There is already a locale.getpreferredencoding(False) function which
  give your the current locale encoding. The problem is that the current
  locale encoding may change and so you have to get the new value each
  time than you would like to encode or decode data.

How can that happen if the programmer (or a module she has imported)
isn't messing with the locale?  If the programmer is messing with the
locale, really they need to be careful.  A magic codec whose encoding
changes *within* a process is an accident waiting to happen.

Do you have a real use case for the 'locale' codec's encoding changes
with the locale within process feature?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-10 Thread Martin v. Löwis
 As And pointed out, this is already the behaviour of the mbcs codec
 under Windows. locale would be the moral (*) equivalent of that under
 Unix.

Indeed, and that precedent should be enough reason *not* to include a
locale encoding. The mbcs encoding has caused much user confusion
over the years, and it is less useful than people typically think. For
example, for some time, people thought that names in zip files ought to
be encoded in mbcs, only to find out that this is incorrect years
later. With a locale encoding, the risk for confusion and untestable
code is too high (just consider the ongoing saga of the Turkish dotless
i (ı)).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-10 Thread Victor Stinner
2012/2/10 Martin v. Löwis mar...@v.loewis.de:
 As And pointed out, this is already the behaviour of the mbcs codec
 under Windows. locale would be the moral (*) equivalent of that under
 Unix.

 Indeed, and that precedent should be enough reason *not* to include a
 locale encoding. The mbcs encoding has caused much user confusion
 over the years, and it is less useful than people typically think. For
 example, for some time, people thought that names in zip files ought to
 be encoded in mbcs, only to find out that this is incorrect years
 later. With a locale encoding, the risk for confusion and untestable
 code is too high (just consider the ongoing saga of the Turkish dotless
 i (ı)).

Well, I expected answer and I agree that there are more drawbacks than
advantages. I will close the issue as wontfix. The current locale can
already be read using locale.getpreferredencoding(False) and I already
fixed functions using the current locale encoding.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Victor Stinner
 I think there's a general expectation that if you encode something
 with one codec you will be able to decode it with the same codec.
 That's not necessarily true for the locale encoding.

There is the same problem with the filesystem encoding
(sys.getfilesystemencoding()), which is the user locale encoding
(LC_ALL, LANG or LC_CTYPE)  or the Windows ANSI code page. If you
wrote a file using this encoding, you may not be able to read it if
the filesystem encoding changes between two run, or on another
computer.

I agree that it is more surprising because the current locale encoding
can change anytime, not only between two runs or when you use another
computer.

Don't you think that this special behaviour can be documented?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Antoine Pitrou
On Thu, 9 Feb 2012 08:43:02 +0200
Simon Cross hodgestar+python...@gmail.com wrote:

 On Thu, Feb 9, 2012 at 2:35 AM, Steven D'Aprano st...@pearwood.info wrote:
  Simon Cross wrote:
 
  I think I'm -1 on a locale encoding because it refers to different
  actual encodings depending on where and when it's run, which seems
  surprising
 
 
  Why is it surprising? Surely that's the whole point of a locale encoding: to
  use the locale encoding, whatever that happens to be at the time.
 
 I think there's a general expectation that if you encode something
 with one codec you will be able to decode it with the same codec.
 That's not necessarily true for the locale encoding.

As And pointed out, this is already the behaviour of the mbcs codec
under Windows. locale would be the moral (*) equivalent of that under
Unix.

(*) or perhaps immoral :-)

Regards

Antoine.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Amaury Forgeot d'Arc
2012/2/9 Antoine Pitrou solip...@pitrou.net

  I think there's a general expectation that if you encode something
  with one codec you will be able to decode it with the same codec.
  That's not necessarily true for the locale encoding.

 As And pointed out, this is already the behaviour of the mbcs codec
 under Windows. locale would be the moral (*) equivalent of that under
 Unix.


With the difference that mbcs cannot change during execution.
I don't even know if it is possible to change it at all, except by
reinstalling Windows.

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Victor Stinner
 With the difference that mbcs cannot change during execution.

It is possible to change the thread ANSI code page (CP_THREAD_ACP)
at runtime, but setting the system ANSI code page (CP_ACP) requires to
restart Windows.

 I don't even know if it is possible to change it at all, except by
 reinstalling Windows.

The system ANSI code page can be set in the regional dialog of the
control panel. If I remember correctly, it is badly called the
language.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Victor Stinner
 As And pointed out, this is already the behaviour of the mbcs codec
 under Windows. locale would be the moral (*) equivalent of that under
 Unix.

On Windows, the ANSI code page codec will be accessible using 3
different names: locale, mbcs and the real encoding name
(sys.getfilesystemencoding())!

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Stephen J. Turnbull
Victor Stinner writes:

  There is the same problem [that encode-decode with the 'locale'
  codec doesn't roundtrip reliably] with the filesystem encoding
  (sys.getfilesystemencoding()),

-1 on a query to the OS that pretends to be a constant.

You see, it's not the same problem.  The difference is that 'locale'
is a constant and should correspond to a constant encoding, while
'sys.getfilesystemcoding()' is a library function that queries the
system, and it's obvious from the syntax that this is expected to
change in various circumstances, so if you want roundtripping you need
to save the result.

Having a nondeterministic locale codec is just begging application
(and maybe a few middleware) programmers to use it everywhere they
don't feel like thinking about I18N.  Experience shows that that is
everywhere!

If this is needed, it should be spelled os.getlocaleencoding() (or
sys.getlocaleencoding()?)  Possibly there should be corresponding
getlocalelanguage(), getlocaleregion(), and getlocalemodifier()
functions, and they should take an optional string argument whose
appropriate component is returned.

Or maybe there should be a parselocalestring() function that returns
a named tuple.

Or maybe this three-line function doesn't need to be a builtin?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Nick Coghlan
On Fri, Feb 10, 2012 at 12:59 AM, Stephen J. Turnbull
step...@xemacs.org wrote:
 If this is needed, it should be spelled os.getlocaleencoding() (or
 sys.getlocaleencoding()?)

Or locale.getpreferredencoding(), even ;)

FWIW, I agree with Stephen on this one, but take that with the grain
of salt that I could probably decode most of the strings I work with
as ASCII without breaking anything.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-09 Thread Victor Stinner
 If this is needed, it should be spelled os.getlocaleencoding() (or
 sys.getlocaleencoding()?)

There is already a locale.getpreferredencoding(False) function which
give your the current locale encoding. The problem is that the current
locale encoding may change and so you have to get the new value each
time than you would like to encode or decode data.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Victor Stinner
2012/2/8 Simon Cross hodgestar+python...@gmail.com:
 Is the idea to have:

  bfoo.decode(locale)

 be roughly equivalent to

  encoding = locale.getpreferredencoding(False)
  bfoo.decode(encoding)

 ?

Yes. Whereas:

bfoo.decode(sys.getfilesystemencoding())

is equivalent to

encoding = locale.getpreferredencoding(True)
bfoo.decode(encoding)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Simon Cross
I think I'm -1 on a locale encoding because it refers to different
actual encodings depending on where and when it's run, which seems
surprising, and there's already a more explicit way to achieve the
same effect.

The documentation on .getpreferredencoding() says some scary things
about needing to call .setlocale() sometimes but doesn't really say
when or why. Could any of those cases make locale do weird things
because it doesn't call setlocale()?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread And Clover

On 2012-02-08 09:28, Simon Cross wrote:

I think I'm -1 on a locale encoding because it refers to different
actual encodings depending on where and when it's run, which seems
surprising, and there's already a more explicit way to achieve the
same effect.


I'd agree that this is undesirable, and I don't really want 
locale-specific behaviour to leak out in other places that accept a 
encoding name (eg ?xml encoding=locale?), but we already have this 
behaviour with the mbcs encoding on Windows which refers to the 
locale-specific 'ANSI' code page.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
gtalk:chat?jid=bobi...@doxdesk.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Victor Stinner
2012/2/8 Simon Cross hodgestar+python...@gmail.com:
 I think I'm -1 on a locale encoding because it refers to different
 actual encodings depending on where and when it's run, which seems
 surprising, and there's already a more explicit way to achieve the
 same effect.

The following code is just an example to explain how locale is
supposed to work, but the implementation is completly different:

encoding = locale.getpreferredencoding(False)
... execute some code ...
text = bytes.decode(encoding)
bytes = text.encode(encoding)

The current locale is process-wide: if a thread changes the locale,
all threads are affected. Some functions have to use the current
locale encoding, and not the locale encoding read at startup. Examples
with C functions: strerror(), strftime(), tzname, etc.

My codec implementation uses mbstowcs() and wcstombs() which don't
touch the current locale, but just use it. Said diffferently, the
locale codec would just give access to these functions.

 The documentation on .getpreferredencoding() says some scary things
 about needing to call .setlocale() sometimes but doesn't really say
 when or why.

locale.getpreferredencoding() always call setlocale() by default.
locale.getpreferredencoding(False) doesn't call setlocale().
setlocale() is not called on Windows or if locale.CODESET is not
available (it is available on FreeBSD, Mac OS X, Linux, etc.).

 Could any of those cases make locale do weird things because it doesn't 
 call setlocale()?

Sorry, I don't understand what do you mean by weird things. The
locale codec doesn't touch the locale.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Simon Cross
On Wed, Feb 8, 2012 at 3:25 PM, Victor Stinner
victor.stin...@haypocalc.com wrote:
 Sorry, I don't understand what do you mean by weird things. The
 locale codec doesn't touch the locale.

Sorry for being unclear. My question was about the following lines
from http://docs.python.org/library/locale.html#locale.getpreferredencoding:

On some systems, it is necessary to invoke setlocale() to obtain
the user preferences, so this function is not thread-safe. If invoking
setlocale is not necessary or desired, do_setlocale should be set to
False.

So my question was about what happens on such systems where invoking
setlocale is necessary to obtain the user preferences?

Schiavo
Simon
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Simon Cross
On Wed, Feb 8, 2012 at 3:25 PM, Victor Stinner
victor.stin...@haypocalc.com wrote:
 The current locale is process-wide: if a thread changes the locale,
 all threads are affected. Some functions have to use the current
 locale encoding, and not the locale encoding read at startup. Examples
 with C functions: strerror(), strftime(), tzname, etc.

Could a core part of Python breaking because of a sequence like:

1) Encode unicode to bytes using locale codec.
2) Silly third-party library code changes the locale codec.
3) Attempt to decode bytes back to unicode using the locale codec
(which is now a different underlying codec).

?

Schiavo
Simon
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Victor Stinner
 The current locale is process-wide: if a thread changes the locale,
 all threads are affected. Some functions have to use the current
 locale encoding, and not the locale encoding read at startup. Examples
 with C functions: strerror(), strftime(), tzname, etc.

 Could a core part of Python breaking because of a sequence like:

 1) Encode unicode to bytes using locale codec.
 2) Silly third-party library code changes the locale codec.
 3) Attempt to decode bytes back to unicode using the locale codec
 (which is now a different underlying codec).

When you decode data from the OS, you have to use the current locale
encoding. If you use a variable to store the encoding and the locale
is changed, you have to update your variable or you get mojibake.

Example with Python 2:

lisa$ python2.7
Python 2.7.2+ (default, Oct  4 2011, 20:06:09)
 import locale
 encoding=locale.getpreferredencoding(False)
 encoding
'ANSI_X3.4-1968'
 encoding, os.strerror(23).decode(encoding)
u'Too many open files in system'
 locale.setlocale(locale.LC_ALL, '') # set the locale
'fr_FR.UTF-8'
 os.strerror(23).decode(encoding)
Traceback (most recent call last):
  ...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
37: ordinal not in range(128)
 encoding=locale.getpreferredencoding(False)
 encoding
'UTF-8'
 os.strerror(23).decode(encoding)
u'Trop de fichiers ouverts dans le syst\xe8me'

You have to update manually encoding because setlocale() changed
LC_MESSAGES locale category (message language) but also LC_CTYPE
locale category (encoding).

Using the locale encoding, you always get the current locale encoding.

In some cases, you must use sys.getfilesystemencoding() (e.g. write
into the console or encode/decode filenames), in other cases, you must
use the current locale encoding (e.g. sterror() or strftime()). Python
3 does most of the work for me, so you don't have to care of the
locale encoding (you just manipulate Unicode, it decodes bytes or
encode back to bytes for you). But in some cases, you have to decode
or encode manually using the right encoding. In this case, the
locale codec can help you.

The documentation will have to explain exactly what this new codec is,
because as expected, it is confusing :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Steven D'Aprano

Simon Cross wrote:

I think I'm -1 on a locale encoding because it refers to different
actual encodings depending on where and when it's run, which seems
surprising


Why is it surprising? Surely that's the whole point of a locale encoding: to 
use the locale encoding, whatever that happens to be at the time.


Perhaps I'm missing something, but I don't see how this proposal is any more 
surprising than the fact that (say) Decimal uses a global context if you don't 
specify one explicitly. Only this should be *less* surprising, because Decimal 
uses the global context by default, while this will use the global locale 
encoding only if you explicitly tell it to.




--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-08 Thread Simon Cross
On Thu, Feb 9, 2012 at 2:35 AM, Steven D'Aprano st...@pearwood.info wrote:
 Simon Cross wrote:

 I think I'm -1 on a locale encoding because it refers to different
 actual encodings depending on where and when it's run, which seems
 surprising


 Why is it surprising? Surely that's the whole point of a locale encoding: to
 use the locale encoding, whatever that happens to be at the time.

I think there's a general expectation that if you encode something
with one codec you will be able to decode it with the same codec.
That's not necessarily true for the locale encoding.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Add a new locale codec?

2012-02-07 Thread Victor Stinner
Hi,

I added PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and
PyUnicode_EncodeLocale() to Python 3.3 to fix bugs. I hesitate to
expose this codec in Python: it can be useful is some cases,
especially if you need to interact with C functions.

The glib library has functions using the *current* locale encoding,
g_locale_from_utf8() for example.

Related issue with more information:
http://bugs.python.org/issue13619

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new locale codec?

2012-02-07 Thread Simon Cross
Is the idea to have:

  bfoo.decode(locale)

be roughly equivalent to

  encoding = locale.getpreferredencoding(False)
  bfoo.decode(encoding)

?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com