Re: [Python-3000] Unicode and OS strings

Marcin 'Qrczak' Kowalczyk Mon, 17 Sep 2007 12:13:43 -0700

Dnia 15-09-2007, So o godzinie 09:13 +0900, Stephen J. Turnbull
napisał(a):


>  > Well, for any scheme which attempts to modify UTF-8 by accepting
>  > arbitrary byte strings is used, *something* must be interpreted
>  > differently than in real UTF-8.
> 
> Wrong.  In my scheme everything ends up in the PUA, on which real
> UTF-8 imposes no interpretation by definition.

This is wrong: UTF-8 is specified for PUA. PUA is no special from the
point of view of UTF-8. UTF-8 is defined for all Unicode scalar values,
i.e. all code points in the ranges U+0000..U+D7FF and U+E000..U+10FFFF,
i.e. all code points excluding surrogates. This includes PUA.

> I haven't gone back to check yet, but it's possible that a "real UTF-8
> conforming process" is required to stop processing and issue an error
> or something like that in the cases we're trying to handle.

"C10. When a process interprets a code unit sequence which purports to
be in a Unicode character encoding form, it shall treat ill-formed code
unit sequences as an error condition and shall not interpret such
sequences as characters."

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Unicode and OS strings

Reply via email to