Re: [Python-3000] characters data type

Guido van Rossum Wed, 03 May 2006 09:41:04 -0700

On 5/2/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> I'm still thinking that it might be a good idea to (optionally) delay de-
> coding of strings until you're actually doing something that needs access
> to the individual characters, though.  (UTF-8 to UTF-8 shuffling is an
> increasingly common use case).


This seems a reasonable alternative abstraction that could be built on
top of bytes and (unicode) strings. Are you thinking of a situation
where you know that it's UTF-8? Or are you also thinking of doing this
for arbitrary encodings? Without knowing the encoding it's hard to
know where the boundaries between characters are, which means you
can't do anything that involves splitting the input into chunks, if
later you may attempt to decode a chunk.

There is of course nothing to stop you from copying a UTF-8 file in
binary mode -- but you seem to be after something more. Perhaps you
could elaborate an example, and explain some of your assumptions (e.g.
are you only talking UTF-8)?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] characters data type

Reply via email to