Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Stephen J. Turnbull
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes: >> This means that a way of handling such points is very useful, and >> as long as there's enough PUA space, the approach I suggested can >> handle all of these various issues. > PUA already has a representation in UTF-8, so this is more

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Greg Ewing
Stephen J. Turnbull wrote: > You can't win that, because Unicode is the only encoding that attempts > to guarantee even the possibility of round-tripping. Rubbish -- I can do print [ord(c) for c in my_unicode_string] and get perfect round-trippability if I want. You can ask people to use pre-exis

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Stephen J. Turnbull
Greg Ewing writes: > Stephen J. Turnbull wrote: > > What should happen internally is that all undecodable characters > > (which PUA characters are by definition for standard codecs) are > > mapped to unused codepoints in the PUA, chosen by Python. > > You mean chosen dynamically? Yes. >

Re: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support

2007-09-13 Thread Greg Ewing
Travis E. Oliphant wrote: > I would want to encourage people not to use the LOCK_FOR_READ unless > there is an important benefit or need to use it. If you mean that LOCK_FOR_READ would unilaterally deny anyone else read access, my proposal avoids this by not having such a mode at all. So you can

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Greg Ewing
Stephen J. Turnbull wrote: > What should > happen internally is that all undecodable characters (which PUA > characters are by definition for standard codecs) are mapped to unused > codepoints in the PUA, chosen by Python. You mean chosen dynamically? What happens if these PUA characters get encod

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread James Y Knight
On Sep 13, 2007, at 12:22 PM, Marcin 'Qrczak' Kowalczyk wrote: > What should happen when a command line argument or an environment > variable is not decodable using the system encoding (on Unix where > from the OS point of view it is an array of bytes)? Here's a suggestion I made on the SBCL dev l

Re: [Python-3000] __format__ and datetime

2007-09-13 Thread Greg Ewing
[EMAIL PROTECTED] wrote: > I was just thinking about the folks at places like FermiLab and CERN. ;-) Those guys probably need picoseconds... -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000

Re: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support

2007-09-13 Thread Greg Ewing
Gregory P. Smith wrote: > When I read the plain term EXCLUSIVE I read that to mean nobody else can > read -or- write, ie: not shared in any sense. You're right, it's not the best term. > Lets extend these base > concepts to SHARED_READ, SHARED_WRITE, EXCLUSIVE_READ, EXCLUSIVE_WRITE EXCLUDE_WRI

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Marcin 'Qrczak' Kowalczyk
Dnia 14-09-2007, Pt o godzinie 06:12 +0900, Stephen J. Turnbull napisał(a): > This means that a way of handling such points > is very useful, and as long as there's enough PUA space, the approach > I suggested can handle all of these various issues. PUA already has a representation in UTF-8, so t

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Stephen J. Turnbull
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes: >> Of course, if the input data already contains PUA characters, >> there would be an ambiguity. We can rule this out for most codecs, >> as they don't support PUA characters. The major exception would >> be UTF-8, > Most codecs other t

Re: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support

2007-09-13 Thread Travis E. Oliphant
Guido van Rossum wrote: > On 9/11/07, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > >> I'm not sure I understand the difference between a classic read lock and >> the exclusive write lock concept. Does the classic read-lock just >> prevent writing to the memory area. In my mind that is a re

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Marcin 'Qrczak' Kowalczyk
Dnia 13-09-2007, Cz o godzinie 19:08 +0200, "Martin v. Löwis" napisał(a): > Of course, if the input data already contains PUA characters, > there would be an ambiguity. We can rule this out for most codecs, > as they don't support PUA characters. The major exception would > be UTF-8, Most codecs

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Martin v. Löwis
> > We would make a list of all interfaces that use the PUA error > > handler: file names, environment variables, command line > > arguments. > > In general, I don't consider this an error. I don't, either. However, given the current codec design, this is the least intrusive way to enhance "al

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Stephen J. Turnbull
"Martin v. Löwis" writes: > One "universal" solution is to use Unicode private-use-area > characters. +1 > Of course, if the input data already contains PUA characters, > there would be an ambiguity. That may be true in the implementation, but it shouldn't. What should happen internally i

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Martin v. Löwis
> Yes, I have noticed this too. Environment variables, command line > arguments, locale properties, TZ names, and so on, are often given as > 8-bit strings in who knows what encoding. I'm not sure what the > solution is, but we need one. One "universal" solution is to use Unicode private-use-area

Re: [Python-3000] Unicode and OS strings

2007-09-13 Thread Guido van Rossum
Yes, I have noticed this too. Environment variables, command line arguments, locale properties, TZ names, and so on, are often given as 8-bit strings in who knows what encoding. I'm not sure what the solution is, but we need one. I'm guessing one thing we need to do is research how various systems

[Python-3000] Unicode and OS strings

2007-09-13 Thread Marcin 'Qrczak' Kowalczyk
What should happen when a command line argument or an environment variable is not decodable using the system encoding (on Unix where from the OS point of view it is an array of bytes)? This is an unfortunate side effect of switching to Unicode. It's unfortunate because often the data is only passe