Guido van Rossum wrote: > I'm not sure if it works for all encodings, but if possible I'd like > to extend the seeking semantics on text files: seek positions are byte > counts, and the application should consider them as "magic cookies".
If the seek position is merely a number, it won't work for all encodings. For the ISO 2022 ones (iso-2022-jp etc), you need to know the shift state: you can switch to a different encoding in the stream using standard escape codes, and then the same bytes are interpreted differently. For example, iso-2022-jp supports these escape codes: ESC ( B ASCII ESC $ @ JIS X 0208-1978 ESC $ B JIS X 0208-1983 ESC ( J JIS X 0201-Roman ESC $ A GB2312-1980 ESC $ ( C KSC5601-1987 ESC $ ( D JIS X 0212-1990 ESC . A ISO8859-1 ESC . F ISO8859-7 So at a certain position in the stream, the same bytes could mean different characters, depending on which "shift state" you are in. That's why ISO C introduced fgetpos/fsetpos in addition to ftell/fseek: an fpos_t is a truly opaque structure that can also incorporate codec state. If you follow this approach, you can get back most of seek; you will lose the "whence" parameter, i.e. you cannot seek forth and back, and you cannot position at the end of the file (actually, iso-2022-jp still supports appending to a file, since it requires that all data "shift out" back to ASCII at the end of each line, and at the end of the file. So "correct" ISO 2022 files can still be concatenated) > Is there any reason not to do Universal Newline processing on *all* > text files? Correct. However, this still might result in a full rewrite of the universal newlines code: the code currently operates on byte streams, when it "should" operate on character streams. In some encodings, CRLF simply isn't represented by \x0d\x0a (e.g. UTF-16-LE: \x0d\0\0x0a\0) Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com