Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Martin v. Löwis
> What I'm suggesting is to provide a way for processes to record and > communicate that information without needing to provide a "source > encoding" slot for strings, and which is able to handle strings > containing unrecognized (including corrupt) characters from multiple > source encodings. Can

Re: [Python-3000] ordered dict for p3k collections?

2007-09-14 Thread Mark Summerfield
On 2007-09-14, Martin v. Löwis wrote: > >> That's a sorted dict. PEP 3115 wants an insertion-ordered dict. > >> You're not the first to confuse them. ;) > > > > Hmmm, I'd not come across that terminology distinction before. > > I guess I'll have to rename mine then. > > I think "insertion-ordered"

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Stephen J. Turnbull
Greg Ewing writes: > Stephen J. Turnbull wrote: > > You chose the context of round-tripping *across > > encodings*, not me. Please stick with your context. > > Maybe we have different ideas of what the problem is. I thought > the problem is to take arbitrary byte sequences coming in as >

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Stephen J. Turnbull
Hagen Fürstenau writes: > And what if we skillfully conserve unknown bytes in a private use or > surrogate area and the application author actually knows the encoding > and wants correctly decoded strings? This is what my proposal would do, but my proposal would would return a string, not by

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Stephen J. Turnbull
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes: >> And it *is* needed, because these characters by assumption >> are not present in Unicode at all. (More precisely, they may be >> present, but the tables we happen to have don't have mappings for >> them.) > They are present! For UTF

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Guido van Rossum
On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > Great idea, but sys.argv doesn't need to be magic for this approach to work. > > Are you sure? I thought part of the problem was that > if an argv entry couldn't be decoded, you got an error > too soon to do anything ab

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Greg Ewing
Guido van Rossum wrote: > Great idea, but sys.argv doesn't need to be magic for this approach to work. Are you sure? I thought part of the problem was that if an argv entry couldn't be decoded, you got an error too soon to do anything about it. Making sys.argv lazy would avoid that. -- Greg _

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Guido van Rossum
On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote: > It would be pretty disruptive to ask everyone to change > their habit of thinking of sys.argv as a list of strings. Indeed. > I would suggest doing it the other way around -- have > sys.argv be an object that automatically converts to > unicode

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Greg Ewing
Hagen Fürstenau wrote: > sys.argv could be of type bytes and sys.arguments (or whatever) could be > a function taking an encoding parameter (which defaults to UTF-8) and > returning strings. > > Of course that's backwards incompatible and I'm not sure if it's too > late for something like this

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Greg Ewing
Stephen J. Turnbull wrote: > You chose the context of round-tripping *across > encodings*, not me. Please stick with your context. Maybe we have different ideas of what the problem is. I thought the problem is to take arbitrary byte sequences coming in as command-line args and represent them as u

Re: [Python-3000] ordered dict for p3k collections?

2007-09-14 Thread Martin v. Löwis
>> That's a sorted dict. PEP 3115 wants an insertion-ordered dict. >> You're not the first to confuse them. ;) > > Hmmm, I'd not come across that terminology distinction before. > I guess I'll have to rename mine then. I think "insertion-ordered" is over-specification, just to make the distincti

Re: [Python-3000] ordered dict for p3k collections?

2007-09-14 Thread Larry Hastings
Mark Summerfield wrote: (Personally I've never needed an insertion-ordered dict.) Then you've never programmed in PHP I take it. PHP's one-size-fits-all data structure is an insertion-ordered dict; PHP programmers use it everywhere a Python programmer might use a dict /or/ a list. I've had

Re: [Python-3000] ordered dict for p3k collections?

2007-09-14 Thread Mark Summerfield
On 2007-09-14, Adam Olsen wrote: > On 9/14/07, Mark Summerfield <[EMAIL PROTECTED]> wrote: > > On 2007-09-14, Nicko van Someren wrote: > > > On 11 Sep 2007, at 15:06, Mark Summerfield wrote: > > > > Is there any chance that an ordered dict will be added to Python 3's > > > > library? > > > > > > It

Re: [Python-3000] ordered dict for p3k collections?

2007-09-14 Thread Adam Olsen
On 9/14/07, Mark Summerfield <[EMAIL PROTECTED]> wrote: > On 2007-09-14, Nicko van Someren wrote: > > On 11 Sep 2007, at 15:06, Mark Summerfield wrote: > > > Is there any chance that an ordered dict will be added to Python 3's > > > library? > > > > It would make sense, since one of the primary jus

Re: [Python-3000] ordered dict for p3k collections?

2007-09-14 Thread Mark Summerfield
On 2007-09-14, Nicko van Someren wrote: > On 11 Sep 2007, at 15:06, Mark Summerfield wrote: > > Is there any chance that an ordered dict will be added to Python 3's > > library? > > It would make sense, since one of the primary justifications for the > new metaclass system (PEP 3115) is to allow th

Re: [Python-3000] ordered dict for p3k collections?

2007-09-14 Thread Nicko van Someren
On 11 Sep 2007, at 15:06, Mark Summerfield wrote: > Is there any chance that an ordered dict will be added to Python 3's > library? It would make sense, since one of the primary justifications for the new metaclass system (PEP 3115) is to allow the metaclass to provide order-preserving diction

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Jim Jewett
On 9/14/07, Hagen Fürstenau <[EMAIL PROTECTED]> wrote: > Is it too unreasonable to keep the byte strings we get from the OS as > byte strings in Python (since we're not sure about their encoding) and > offer functions for getting strings? > sys.argv could be of type bytes and sys.arguments (or wha

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Hagen Fürstenau
> They can easily roundtrip that then to the encoding that it should have: > > good_string = sys.argv[bad_string_index].\ >encode(sys.argv_encoding, "pua-replace").decode(real_encoding) To me this doesn't look easier than sys.arguments() in the standard case and sys.arguments(encoding="whate

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Martin v. Löwis
> Are you sure that "strings in an unknown encoding" are conceptually > strings and not rather bytes? For file names, most definitely. For command line arguments, I am fairly sure: the argc/argv calling convention does not allow for arbitrary bytes. > And what if we skillfully conserve unknown by

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Hagen Fürstenau
> That is not a concern. However, it is fundamentally the wrong thing to > do. Most people rightfully view command line arguments and file names > as strings, as they use the keyboard to enter them, and the computer > uses letters from a font to display them. They are not bytes > conceptually - the

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Martin v. Löwis
> Is it too unreasonable to keep the byte strings we get from the OS as > byte strings in Python (since we're not sure about their encoding) and > offer functions for getting strings? I think people will complain if command line arguments aren't strings, and they will complain even more so if fi

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sep 14, 2007, at 5:15 AM, Hagen Fürstenau wrote: > Is it too unreasonable to keep the byte strings we get from the OS as > byte strings in Python (since we're not sure about their encoding) and > offer functions for getting strings? > > sys.argv co

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sep 14, 2007, at 1:08 AM, Greg Ewing wrote: > Stephen J. Turnbull wrote: >> You can't win that, because Unicode is the only encoding that >> attempts >> to guarantee even the possibility of round-tripping. > > Rubbish -- I can do print [ord(c) fo

Re: [Python-3000] __format__ and datetime

2007-09-14 Thread Nick Coghlan
Greg Ewing wrote: > [EMAIL PROTECTED] wrote: >> I was just thinking about the folks at places like FermiLab and CERN. ;-) > > Those guys probably need picoseconds... With the suggested %f format character and the mention of Fermilab and CERN, I started thinking about femtoseconds :) Cheers, Nic

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Hagen Fürstenau
Is it too unreasonable to keep the byte strings we get from the OS as byte strings in Python (since we're not sure about their encoding) and offer functions for getting strings? sys.argv could be of type bytes and sys.arguments (or whatever) could be a function taking an encoding parameter (whi

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Stephen J. Turnbull
Greg Ewing writes: > Stephen J. Turnbull wrote: > > You can't win that, because Unicode is the only encoding that attempts > > to guarantee even the possibility of round-tripping. > > Rubbish -- I can do print [ord(c) for c in my_unicode_string] > and get perfect round-trippability if I wan

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Marcin 'Qrczak' Kowalczyk
Dnia 13-09-2007, Cz o godzinie 23:41 -0400, James Y Knight napisał(a): > Here's a suggestion I made on the SBCL dev list a while back, in > response to the same issues. After a second thought, this (escaping undecodable UTF-8 bytes by unpaired low surrogates) might be a good idea. (I don't rem

Re: [Python-3000] Unicode and OS strings

2007-09-14 Thread Marcin 'Qrczak' Kowalczyk
Dnia 14-09-2007, Pt o godzinie 15:02 +0900, Stephen J. Turnbull napisał(a): > > PUA already has a representation in UTF-8, so this is more incompatible > > with UTF-8 than needed, > > Hm? It's not incompatible at all, and we're not interested in a > representation in UTF-8, but rather in UTF-1