[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-20 Thread Éric Araujo
Changes by Éric Araujo : -- nosy: +eric.araujo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyth

[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-09 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +ezio.melotti ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-09 Thread Brian Warner
Brian Warner added the comment: > Use .readline() to locate an invalid byte is not the right algorithm. If > you would like to do that, you should open the file in binary mode and > decodes the content yourself, chunk by chunk. Or if you manipulate small > files, you can use .read() as you wrote

[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-08 Thread STINNER Victor
STINNER Victor added the comment: The error occurs in .readline(): .readline() fills a buffer by reading the file chunk by chunk. Each time a chunk is read, it is decoded by the stateful decoder. The problem is that the decoder doesn't know the file offset. Even if it knew, start and end attr

[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-08 Thread R. David Murray
Changes by R. David Murray : -- nosy: +pitrou ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pytho

[issue10370] py3 readlines() reports wrong offset for UnicodeDecodeError

2010-11-08 Thread Brian Warner
New submission from Brian Warner : I noticed that the UnicodeDecodeError exception produced by trying to do open(fn).readlines() (i.e. using the default ASCII encoding) on a file that's actually UTF-8 reports the wrong offset for the first undecodeable character. From what I can tell, it repor