On 6/9/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Rauli Ruohonen writes: > > The ones it absolutely prohibits in interchange are surrogates. > > Excuse me? Surrogates are code points with a specific interpretation > if it is "purported that the stream is in UTF-16". Otherwise, Unicode > 4.0 explicitly says that there is nothing illegal about an isolated > surrogate (p.75, where an example is given of how such a surrogate > might occur).
I meant interchange instead of strings. Anything is allowed in strings. Chapter 2 (not normative, but clear) explains on page 26: Restricted interchange. [...] - Surrogate code points cannot be conformantly interchanged using Unicode encoding forms. [...] - Noncharacter code points are reserved for internal use, such as for sentinel values. They should never be interchanged. [...] > My point was precisely that I don't object to this implementation. I > want Unicode-ly-correct behavior to be a goal of the language, the > community disagrees, and Guido disagrees. That's that. My understanding is that it is a goal, but practicality beats purity. I think the only disagreement is on what's practical. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com