Re: [Python-Dev] Generalised String Coercion

2005-08-09 Thread Nick Coghlan
James Y Knight wrote: Hum, actually, it somewhat makes sense for the open builtin to become what is now codecs.open, for convenience's sake, although it does blur the distinction between a byte stream and a character stream somewhat. If that happens, I suppose it does actually make

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Martin v. Löwis
Bob Ippolito wrote: It's UTF-8 by default, I highly doubt many people bother to change it. I think your doubts are unfounded. Many Japanese people change it to EUC-JP (I believe), as UTF-8 support doesn't work well for them (or atleast didn't use to). Regards, Martin

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Martin v. Löwis
Guido van Rossum wrote: We might be able to get there halfway in Python 2.x: we could introduce the bytes type now, and provide separate APIs to read and write them. (In fact, the array module and the f.readinto() method make this possible today, but it's too klunky so nobody uses it.

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Martin v. Löwis
Phillip J. Eby wrote: Hm. What would be the use case for using %s with binary, non-text data? Well, I could see using it to write things like netstrings, i.e. sock.send(%d:%s, % (len(data),data)) seems like the One Obvious Way to write a netstring in today's Python at least. But perhaps

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Stephen J. Turnbull
Martin == Martin v Löwis [EMAIL PROTECTED] writes: Martin I think your doubts are unfounded. Many Japanese people Martin change it to EUC-JP (I believe), as UTF-8 support doesn't Martin work well for them (or atleast didn't use to). If you mean the UTF-8 support in Terminal, it's no

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Martin v. Löwis
Stephen J. Turnbull wrote: If you mean the UTF-8 support in Terminal, it's no better or worse than the EUC-JP support. The problem is that most Japanese Unix systems continue to default to EUC-JP, and many Windows hosts (including Samba file systems) default to Shift JIS. So people using

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Michael Hudson
M.-A. Lemburg [EMAIL PROTECTED] writes: Set the external encoding for stdin, stdout, stderr: (also an example for adding encoding support to an existing file object): def set_sys_std_encoding(encoding): # Load encoding support

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Nick Coghlan
Martin v. Löwis wrote: Guido van Rossum wrote: The bytes type could just be a very thin wrapper around array('b'). That answers an important question: so you want the bytes type to be mutable (and, consequently, unsuitable as a dictionary key). I would suggest a bytes/frozenbytes pair,

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread M.-A. Lemburg
Guido van Rossum wrote: [Guido] My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string -- after all, int() is allowed to return a long, so why couldn't str() be allowed to return a Unicode string?

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread M.-A. Lemburg
Michael Hudson wrote: M.-A. Lemburg [EMAIL PROTECTED] writes: Set the external encoding for stdin, stdout, stderr: (also an example for adding encoding support to an existing file object): def set_sys_std_encoding(encoding): # Load

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Phillip J. Eby
At 10:07 AM 8/8/2005 +0200, Martin v. Löwis wrote: Phillip J. Eby wrote: Hm. What would be the use case for using %s with binary, non-text data? Well, I could see using it to write things like netstrings, i.e. sock.send(%d:%s, % (len(data),data)) seems like the One Obvious Way to

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Aahz
On Sun, Aug 07, 2005, Neil Schemenauer wrote: On Sat, Aug 06, 2005 at 06:56:39PM -0700, Guido van Rossum wrote: My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string Do you have any thoughts on

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Guido van Rossum
Ouch. Too much discussion to respond to it all. Please remember that in Jythin and IronPython, str and unicode are already synonyms. That's how Python 3.0 will do it, except unicode will disappear as being redundant. I like the bytes/frozenbytes pair idea. Streams could grow a getpos()/setpos()

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Phillip J. Eby
At 09:14 AM 8/8/2005 -0700, Guido van Rossum wrote: I'm not going to change my mind on text() unless someone explains what's so attractive about it. 1. It's obvious to non-programmers what it's for (str and unicode aren't) 2. It's more obvious to programmers that it's a *text* string rather than

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Martin v. Löwis
Phillip J. Eby wrote: Actually, thinking about it some more, it seems to me it's actually more like this: sock.send( (%d:%s, % (len(data),data.decode('latin1'))).encode('latin1') ) While this would work, it would still feel wrong: the binary data are *not* latin1 (most likely), so

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Neil Schemenauer
On Sat, Aug 06, 2005 at 06:56:39PM -0700, Guido van Rossum wrote: My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string -- after all, int() is allowed to return a long, so why couldn't str() be

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread François Pinard
[Phillip J. Eby] At 09:14 AM 8/8/2005 -0700, Guido van Rossum wrote: I'm not going to change my mind on text() unless someone explains what's so attractive about it. 2. It's more obvious to programmers that it's a *text* string rather than a string of bytes I've no opinion on the

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread Stephen J. Turnbull
Martin == Martin v Löwis [EMAIL PROTECTED] writes: Martin While this would work, it would still feel wrong: the Martin binary data are *not* latin1 (most likely), so declaring Martin them to be latin1 would be confusing. Perhaps a synonym Martin '8bit' for latin1 could be

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Reinhold Birkenfeld
Guido van Rossum wrote: The main problem for a smooth Unicode transition remains I/O, in my opinion; I'd like to see a PEP describing a way to attach an encoding to text files, and a way to decide on a default encoding for stdin, stdout, stderr. FWIW, I've already drafted a patch for the

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread M.-A. Lemburg
Guido van Rossum wrote: My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string -- after all, int() is allowed to return a long, so why couldn't str() be allowed to return a Unicode string? The

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Guido van Rossum
[me] a way to decide on a default encoding for stdin, stdout, stderr. [Martin] If stdin, stdout and stderr go to a terminal, there already is a default encoding (actually, there always is a default encoding on these, as it falls back to the system encoding if its not a terminal, or if the

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Guido van Rossum
[Guido] My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string -- after all, int() is allowed to return a long, so why couldn't str() be allowed to return a Unicode string? [MAL] The problem

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Neil Schemenauer
On Sat, Aug 06, 2005 at 06:56:39PM -0700, Guido van Rossum wrote: My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string Do you have any thoughts on what the C API would be? It seems to me that

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Phillip J. Eby
At 05:24 PM 8/7/2005 -0700, Guido van Rossum wrote: Hm. What would be the use case for using %s with binary, non-text data? Well, I could see using it to write things like netstrings, i.e. sock.send(%d:%s, % (len(data),data)) seems like the One Obvious Way to write a netstring in today's

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Martin v. Löwis
Guido van Rossum wrote: If stdin, stdout and stderr go to a terminal, there already is a default encoding (actually, there always is a default encoding on these, as it falls back to the system encoding if its not a terminal, or if the terminal's encoding is not supported or cannot be determined).

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Bob Ippolito
On Aug 7, 2005, at 7:37 PM, Martin v. Löwis wrote: Guido van Rossum wrote: If stdin, stdout and stderr go to a terminal, there already is a default encoding (actually, there always is a default encoding on these, as it falls back to the system encoding if its not a terminal, or if the

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread Martin v. Löwis
Guido van Rossum wrote: I'm not sure if it works for all encodings, but if possible I'd like to extend the seeking semantics on text files: seek positions are byte counts, and the application should consider them as magic cookies. If the seek position is merely a number, it won't work for all

Re: [Python-Dev] Generalised String Coercion

2005-08-06 Thread Terry Reedy
PEP: 349 Title: Generalised String Coercion ... Rationale Python has had a Unicode string type for some time now but use of it is not yet widespread. There is a large amount of Python code that assumes that string data is represented as str instances. The long term plan for

Re: [Python-Dev] Generalised String Coercion

2005-08-06 Thread Guido van Rossum
[Removed python-list CC] On 8/6/05, Terry Reedy [EMAIL PROTECTED] wrote: PEP: 349 Title: Generalised String Coercion ... Rationale Python has had a Unicode string type for some time now but use of it is not yet widespread. There is a large amount of Python code that assumes