On 5/2/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > I'm still thinking that it might be a good idea to (optionally) delay de- > coding of strings until you're actually doing something that needs access > to the individual characters, though. (UTF-8 to UTF-8 shuffling is an > increasingly common use case).
This seems a reasonable alternative abstraction that could be built on top of bytes and (unicode) strings. Are you thinking of a situation where you know that it's UTF-8? Or are you also thinking of doing this for arbitrary encodings? Without knowing the encoding it's hard to know where the boundaries between characters are, which means you can't do anything that involves splitting the input into chunks, if later you may attempt to decode a chunk. There is of course nothing to stop you from copying a UTF-8 file in binary mode -- but you seem to be after something more. Perhaps you could elaborate an example, and explain some of your assumptions (e.g. are you only talking UTF-8)? -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
