Dnia 15-09-2007, So o godzinie 09:13 +0900, Stephen J. Turnbull napisaĆ(a):
> > Well, for any scheme which attempts to modify UTF-8 by accepting > > arbitrary byte strings is used, *something* must be interpreted > > differently than in real UTF-8. > > Wrong. In my scheme everything ends up in the PUA, on which real > UTF-8 imposes no interpretation by definition. This is wrong: UTF-8 is specified for PUA. PUA is no special from the point of view of UTF-8. UTF-8 is defined for all Unicode scalar values, i.e. all code points in the ranges U+0000..U+D7FF and U+E000..U+10FFFF, i.e. all code points excluding surrogates. This includes PUA. > I haven't gone back to check yet, but it's possible that a "real UTF-8 > conforming process" is required to stop processing and issue an error > or something like that in the cases we're trying to handle. "C10. When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition and shall not interpret such sequences as characters." -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com