tomer filiba schrieb: > # read 3 UTF8 *characters* > f.read(3) > > # this will seek by AT LEAST 7 *bytes*, until resynched > f.substream.seekby(7) > > # we can resume reading of UTF8 *characters* > f.read(3) > > heck, i even like this idea :)
Notice that resyncing is a really tricky operation, and should not be expected to work for all encodings. For example, for the iso-2022 encodings, you have to know what character set you are "in", and you have to read forward/backward until you find a character-code switching escape sequence. There is an RFC-imposed requirement that each line of input is "neutral" wrt. character set switching, so you can typically synchronize at a line break. Still, this could require to skip an arbitrary amount of text. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com