On Sat, Jan 11, 2014 at 4:28 PM, Terry Reedy <tjre...@udel.edu> wrote: > On 1/11/2014 1:44 PM, Stephen J. Turnbull wrote: > >> We already *have* a type in Python 3.3 that provides text >> manipulations on arrays of 8-bit objects: str (per PEP 393). >> >> > BTW: I don't know why so many people keep asking for use cases. >> > Isn't it obvious that text data without known (but ASCII compatible) >> > encoding or multiple different encodings in a single data chunk >> > is part of life ? >> >> Isn't it equally obvious that if you create or read all such ASCII- >> compatible chunks as (encoding='ascii', errors='surrogateescape') that >> you *don't need* string APIs for bytes? >> >> Why do these "text chunks" need to be bytes in the first place? >> That's why we ask for use cases. AFAICS, reading and writing ASCII- >> compatible text data as 'latin1' is just as fast as bytes I/O. So >> it's not I/O efficiency, and (since in this model we don't do any >> en/decoding on bytes/str), it's not redundant en/decoding of bytes to >> str and back. > > > The problem with some criticisms of using 'unicode in Python 3' is that > there really is no such thing. Unicode in 3.0 to 3.2 used the old internal > model inherited from 2.x. Unicode in 3.3+ uses a different internal model > that is a game changer with respect to certain issues of space and time > efficiency (and cross-platform correctness and portability). So at least > some the valid criticisms based on the old model are out of date and no > longer valid.
-1 on adding more surrogateesapes by default. It's a pain to track down where the encoding errors came from. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com