Dnia 18-09-2007, Wt o godzinie 13:08 +0900, Stephen J. Turnbull
napisał(a):

>  > This is wrong: UTF-8 is specified for PUA. PUA is no special from the
>  > point of view of UTF-8.
> 
> It is from the point of view of the Unicode standard, specifically v5.
> Please see section 16.5, especially about the "corporate use subarea".

It is not. 16.5 doesn't say anything about UTF-8, and UTF-8 is already
specified for PUA.

>  > UTF-8 is defined for all Unicode scalar values,
> 
> Sure, and what I propose is entirely compatible with the specification
> of UTF-8 as a UTF,

It is not. In UTF-8 '\ue650' is b'\xEE\x99\x90', in your proposal it
might be encoded as a single byte.

>  > "C10. When a process interprets a code unit sequence which purports to
>  > be in a Unicode character encoding form, it shall treat ill-formed code
>  > unit sequences as an error condition and shall not interpret such
>  > sequences as characters."
> 
> Yeah, that's the one.
> 
> While I'm uncomfortable advocating the position that my proposal is
> entirely compatible with C10,

It is not. Elements of PUA are characters.

> it is arguable that "mapping code units to
> characters in private space" is not the same as "interpreting them as
> characters".

It's not the same, but interpreting as characters in PUA is obviously
interpreting as characters.

> chibi:MacPorts steve$ python -c 'import sys; print("%x" % ord(sys.argv[1]))' 
> $(printf "\ue650") 
> Traceback (most recent call last):
>   File "<string>", line 1, in ?
> TypeError: ord() expected a character, but string of length 6 found

I meant Python3 where sys.argv is a list of Unicode strings. It should
work out of the box.

Why length 6? "\ue650" encoded in UTF-8 has length 3.

For an old discussion about using PUA to represent bytes undecodable
as UTF-8, see http://www.mail-archive.com/[EMAIL PROTECTED]/ and
subthreads with "roundtripping" in the subject.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to