On 2/27/07, Adam Olsen <[EMAIL PROTECTED]> wrote: > On 2/26/07, Mike Verdone <[EMAIL PROTECTED]> wrote: > > Text I/O > > ... operate on a per-character basis instead of a per-byte basis.
> "per-character" needs some clarification. I'm guessing this will only > return entire code points, but the unicode type will expose them as > code units, so it could be seen as both per-code-point and > per-code-unit. Does this just mean that you assume (1) UTF32 (2) surrogate pairs will show up as two characters (3) diacritics may (or may not) show up separately from their base characters? This does suggest that error-correction should be specified (or at least explicitly not specified). If the underlying input byte-stream contains an invalid sequence, will the TextIO raise a UnicodeDecodeError? Or will its error/replace/delete behavior be settable? Does the Text class promise to catch things like an invalid combination of surrogates? -jJ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com