>>>>> "Fred" == Fred L Drake, <[EMAIL PROTECTED]> writes:
Fred> On Tuesday 14 February 2006 22:34, Greg Ewing wrote: >> Seems to me this is a case where you want to be able to change >> encodings in the middle of reading the stream. You start off >> reading the data as ascii, and once you've figured out the >> encoding, you switch to that and carry on reading. Fred> Not quite. The proper response in this case is often to Fred> re-start decoding with the correct encoding, since some of Fred> the data extracted so far may have been decoded incorrectly. Fred> A very carefully constructed application may be able to go Fred> back and re-decode any data saved from the stream with the Fred> previous encoding, but that seems like it would be pretty Fred> fragile in practice. I believe GNU Emacs is currently doing this. AIUI, they save annotations where the codec is known to be non-invertible (eg, two charset-changing escape sequences in a row). I do think this is fragile, and a robust application really should buffer everything it's not sure of decoding correctly. Fred> There may be cases where switching encoding on the fly makes Fred> sense, but I'm not aware of any actual examples of where Fred> that approach would be required. This is exactly what ISO 2022 formalizes: switching encodings on the fly. mboxes of Japanese mail often contain random and unsignaled encoding changes. A terminal emulator may need to switch when logging in to a remote system. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com