On 6/9/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Rauli Ruohonen writes:
>  > The ones it absolutely prohibits in interchange are surrogates.
>
> Excuse me?  Surrogates are code points with a specific interpretation
> if it is "purported that the stream is in UTF-16".  Otherwise, Unicode
> 4.0 explicitly says that there is nothing illegal about an isolated
> surrogate (p.75, where an example is given of how such a surrogate
> might occur).

I meant interchange instead of strings. Anything is allowed in strings.

Chapter 2 (not normative, but clear) explains on page 26:

 Restricted interchange. [...]
  - Surrogate code points cannot be conformantly interchanged using
    Unicode encoding forms. [...]
  - Noncharacter code points are reserved for internal use, such as for
    sentinel values. They should never be interchanged. [...]

> My point was precisely that I don't object to this implementation.  I
> want Unicode-ly-correct behavior to be a goal of the language, the
> community disagrees, and Guido disagrees.  That's that.

My understanding is that it is a goal, but practicality beats purity.
I think the only disagreement is on what's practical.
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to