On 2/27/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> On 2/26/07, Mike Verdone <[EMAIL PROTECTED]> wrote:
> > Text I/O
> > ... operate on a per-character basis instead of a per-byte basis.

> "per-character" needs some clarification.  I'm guessing this will only
> return entire code points, but the unicode type will expose them as
> code units, so it could be seen as both per-code-point and
> per-code-unit.

Does this just mean that you assume
(1) UTF32
(2) surrogate pairs will show up as two characters
(3) diacritics may (or may not) show up separately from their base characters?

This does suggest that error-correction should be specified (or at
least explicitly not specified).  If the underlying input byte-stream
contains an invalid sequence, will the TextIO raise a
UnicodeDecodeError?  Or will its error/replace/delete behavior be
settable?

Does the Text class promise to catch things like an invalid
combination of surrogates?

-jJ
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to