[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2017-11-09 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- status: pending -> closed ___ Python tracker ___

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2017-09-20 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- status: open -> pending ___ Python tracker ___

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-16 Thread Martin Panter
Martin Panter added the comment: I think changing the TextIOBase API would be hard to do if you want to keep compatibility with existing code. I agree that encoding the position to a number and back seems like a bad design, but I doubt it is worth changing it at this point. --

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-16 Thread Марк Коренберг
Марк Коренберг added the comment: Well, 03e61104f7a2 adds good description, why not to enforce checks instead of saying that some values are unsupported ? Also, idea in returning special object instance from tell(), this object should incapsulate byte offset. And allow for the seek() either

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-15 Thread STINNER Victor
STINNER Victor added the comment: > If the “slow reconstruction algorithm” was clarified or removed, ... I wrote this algorithm, or I helpd to write it, I don't recall. The problem is readahead: TextIOWrapper read more bytes than requested for performances. But when tell() is called, the user

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: I don't understand what the complaint is. If you think seek()/tell() are not useful, just don't use them. -- ___ Python tracker

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-14 Thread Martin Panter
Martin Panter added the comment: You might be right about the “reconstruction algorithm”. This text was added in revision 0bba533c0959; maybe Antoine can comment whether we should clarify or remove it. I think the text added for TextIOBase.seek() in revision 03e61104f7a2 (Issue 12922) is

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-14 Thread Martin Panter
Martin Panter added the comment: I’m starting to understand that there might be a “reconstruction algorithm” needed. When reading, TextIOWrapper buffers decoded characters. If you call tell() and there is unread but decoded data, it is not enough to return the incremental decoder state. You

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-14 Thread Марк Коренберг
Марк Коренберг added the comment: First, it seems that there are no real "reconstruction algorithm" at all. Seek is allowed to point to any byte position, even to place "inside" characters for multibyte encodings, such as UTF-8. Second, about performance: I talk about implementation

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-14 Thread Марк Коренберг
Марк Коренберг added the comment: s/peek/tell/ -- status: closed -> open ___ Python tracker ___ ___

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-14 Thread Марк Коренберг
Марк Коренберг added the comment: Also, can you provide the case, where such random seeks can be used on text files? It would be programmer error to seek to places other I mention. Does not it ? -- ___ Python tracker

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-14 Thread R. David Murray
R. David Murray added the comment: I think you haven't quite gotten what "opaque token" means in this context. The way you use tell/seek with text files is: you have read to some certain point in the file. You call 'tell' and get back an opqaue token. Later you can call seek with that

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-13 Thread Марк Коренберг
Марк Коренберг added the comment: https://docs.python.org/3.5/library/io.html?highlight=stringio#id3 : Also, TextIOWrapper.tell() and TextIOWrapper.seek() are both quite slow due to the reconstruction algorithm used. What is reconstruction algorightm ? Experiments show, that seek() and tell()

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-13 Thread R. David Murray
R. David Murray added the comment: I'm still not seeing a bug. If you have a performance enhancement or functional enhancement you'd like us to consider, please attach a patch, with benchmark results. Since you say "are quite slow because of the reconstruction algorithm", what makes you say

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-12 Thread R. David Murray
R. David Murray added the comment: As mentioned in those issues, currently the peek/seek token is a black box. That doesn't mean it isn't useful. Those issues are talking about potential ways to make it more useful, so any discussion should occur there. -- nosy: +r.david.murray

[issue25849] files, opened in unicode (text): write() returns symbols count, but seek() expect offset in bytes

2015-12-12 Thread Марк Коренберг
New submission from Марк Коренберг: It seems, that we should deprecate .seek() on files, opened in text mode. Since it is not possible to seek to position between symbols. Yes, it is possible to decode UTF-8 (or other charset) starting from beginning of the file and count symbols, but it is