The encoding/decoding behavior should be no different from that of the encode() and decode() methods on unicode strings and byte arrays.
Certainly no normalization of diacritics will be done; surrogate handling depends on the encoding and whether the unicode string implementation uses 16 or 32 bits per character. I agree that we need to be able to specify the error handling as well. UnicodeErrors may be raised. --Guido On 2/27/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 2/27/07, Adam Olsen <[EMAIL PROTECTED]> wrote: > > On 2/26/07, Mike Verdone <[EMAIL PROTECTED]> wrote: > > > Text I/O > > > ... operate on a per-character basis instead of a per-byte basis. > > > "per-character" needs some clarification. I'm guessing this will only > > return entire code points, but the unicode type will expose them as > > code units, so it could be seen as both per-code-point and > > per-code-unit. > > Does this just mean that you assume > (1) UTF32 > (2) surrogate pairs will show up as two characters > (3) diacritics may (or may not) show up separately from their base characters? > > This does suggest that error-correction should be specified (or at > least explicitly not specified). If the underlying input byte-stream > contains an invalid sequence, will the TextIO raise a > UnicodeDecodeError? Or will its error/replace/delete behavior be > settable? > > Does the Text class promise to catch things like an invalid > combination of surrogates? > > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com