On Tue, May 24, 2022 at 04:31:13AM -0000, mguin...@gmail.com wrote:

> seek() and tell() works with opaque values, called cookies.
> This is close to low level details, but it is not pythonic.

Even after reading the issue you linked to, I am not sure I understand 
either the issue, or your suggested solution.

I *think* that the issue is this:

Suppose we have a text file containing four characters (to be precise: 
code points).

    aΩλz

namely U+0061 U+03A9 U+03BB U+007A. You would like tell() and seek() to 
accept indexes 0, 1, 2, 3, 4 which would move the file pointer to:

    0 moves to the start of the file, just before the a
    1 moves to just before the Ω
    2 moves to just before the λ
    3 moves to just before the z
    4 moves to after the z (EOF).

**But** in reality, the file position cookies for that file will depend 
on the encoding used. For UTF-8, the valid cookies are:

    0 moves to the start of the file, just before the a
    1 moves to just before the Ω
    3 moves to just before the λ
    5 moves to just before the z
    6 moves to after the z (EOF).

Other encodings may give different cookies.

If you seek() to position 4, say, the results will be unpredictable but 
probably not anything good.

In other words, the tell() and seek() cookies represent file positions 
in **bytes**, even though we are reading or writing a text file.

You would like the cookies to be file positions measured in 
**characters** (or to be precise, code points).

Am I close?



-- 
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2DGW5KFVOCDSHKZH6SUQADJXC3TKKUIS/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to