On Thu, 26 May 2022 at 22:07, Eryk Sun <eryk...@gmail.com> wrote:
>
> On 5/26/22, Steven D'Aprano <st...@pearwood.info> wrote:
> >
> > If you seek() to position 4, say, the results will be unpredictable but
> > probably not anything good.
> >
> > In other words, the tell() and seek() cookies represent file positions
> > in **bytes**, even though we are reading or writing a text file.
>
> To clarify the general context, text I/O tell() and seek() cookies
> aren't necessarily just a byte offset. They can be packed integers
> that include a start position, decoder flags, a number of bytes to be
> fed into the decoder, whether the decode operation should be final
> (EOF), and the number of decoded characters (ordinals) to skip.  For
> example:
>
>     >>> open('spam.txt', 'w', encoding='utf-7').write('\u0100'*4)
>     4
>     >>> f = open('spam.txt', encoding='utf-7')
>     >>> f.read(2)
>     'ĀĀ'
>     >>> f.tell()
>     680564734871843039612185603579607777280
>
>     >>> start_pos, dec_flags, bytes_to_feed, need_eof, chars_to_skip = (
>     ...     _pyio.TextIOWrapper._unpack_cookie(..., f.tell()))
>     >>> start_pos, dec_flags, bytes_to_feed, need_eof, chars_to_skip
>     (0, 55834574848, 2, False, 0)

If I'm reading this correctly, the result from f.tell() has enough
information to reconstruct a position within a hypothetical array of
code points contained within the file (that is to say - if you read
the entire file into a string, f.tell() returns something that can be
turned into an index into that string), but that position might not
actually correspond to a single byte location. Is that it?

I think UTF-7 is an awesome encoding. Really good at destroying
people's expectations of what they thought they could depend on.

(Terrible for actually using, though.)

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M46JTRXTHV6FKKBKP4C3IM4FGSGYBUYW/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to