On Wed, Jun 30, 2010 at 10:20 AM, Antoine Pitrou <solip...@pitrou.net> wrote: > On Wed, 30 Jun 2010 10:03:49 -0700 > Guido van Rossum <gu...@python.org> wrote: >> >> > Also, please note that values used by seek() and tell() on >> > text I/O are "opaque cookies". While they can happen to match the >> > raw binary file position, it is a mere coincidence (or an >> > implementation detail, at your will). Therefore, reusing tell() values >> > of a binary file to seek() a TextIOWrapper accessing the same file >> > is wrong. >> >> Well, um, I actually designed it carefully so that bytes offsets >> *would* work as text offsets in those cases where they make sense at >> all. > > Ah, this is embarrassing. I always assumed it was an implementation > detail since neither the PEP nor the module docs say otherwise. > > PEP 3116 clearly says: > > “Unlike with raw I/O, the units for .seek() are not specified - some > implementations (e.g. StringIO) use characters and others (e.g. > TextIOWrapper) use bytes.” > > And also: > > “.seek(pos: object, whence: int = 0) -> int > > Seek to position pos. If pos is non-zero, it must be a cookie > returned from .tell() and whence must be zero.” > > “it must be a cookie returned from .tell()” here seems to imply that > non-zero values of other origin should not be used.
Guilty as charged. I really did take care that it would work, but forgot to mention it. I guess we can depend on this property *inside* the stdlib (as long as there are tests for each piece of code depending on it that would break if it ever changed) but should not advertise it widely. Note that it doesn't go the other way -- due to encoding state, text streams can certainly return cookies that make no sense to binary streams. But text streams take byte offsets too and do the best they can. (Obviously if a byte offset points in the middle of a multibyte character all bets are off.) The C stdlib has a similar thing -- while AFAIK POSIX lseek() really is required to return and take byte offsets, this is not required for fseek() and ftell() according to the C std -- but I think it's still a pretty safe bet, and I betcha lots of apps are making this assumption. -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com