On Thu, 26 May 2022 at 22:07, Eryk Sun <eryk...@gmail.com> wrote: > > On 5/26/22, Steven D'Aprano <st...@pearwood.info> wrote: > > > > If you seek() to position 4, say, the results will be unpredictable but > > probably not anything good. > > > > In other words, the tell() and seek() cookies represent file positions > > in **bytes**, even though we are reading or writing a text file. > > To clarify the general context, text I/O tell() and seek() cookies > aren't necessarily just a byte offset. They can be packed integers > that include a start position, decoder flags, a number of bytes to be > fed into the decoder, whether the decode operation should be final > (EOF), and the number of decoded characters (ordinals) to skip. For > example: > > >>> open('spam.txt', 'w', encoding='utf-7').write('\u0100'*4) > 4 > >>> f = open('spam.txt', encoding='utf-7') > >>> f.read(2) > 'ĀĀ' > >>> f.tell() > 680564734871843039612185603579607777280 > > >>> start_pos, dec_flags, bytes_to_feed, need_eof, chars_to_skip = ( > ... _pyio.TextIOWrapper._unpack_cookie(..., f.tell())) > >>> start_pos, dec_flags, bytes_to_feed, need_eof, chars_to_skip > (0, 55834574848, 2, False, 0)
If I'm reading this correctly, the result from f.tell() has enough information to reconstruct a position within a hypothetical array of code points contained within the file (that is to say - if you read the entire file into a string, f.tell() returns something that can be turned into an index into that string), but that position might not actually correspond to a single byte location. Is that it? I think UTF-7 is an awesome encoding. Really good at destroying people's expectations of what they thought they could depend on. (Terrible for actually using, though.) ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/M46JTRXTHV6FKKBKP4C3IM4FGSGYBUYW/ Code of Conduct: http://python.org/psf/codeofconduct/