[issue34801] codecs.getreader() splits lines containing control characters

2018-10-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This is a duplicate of issue18291. -- nosy: +serhiy.storchaka resolution: -> duplicate stage: -> resolved status: open -> closed superseder: -> codecs.open interprets FS, RS, GS as line ends ___ Python tracker

[issue34801] codecs.getreader() splits lines containing control characters

2018-10-04 Thread Neil Schemenauer
Neil Schemenauer added the comment: Perhaps the 'csv' module should do some sanity checking on the file passed to the reader. The docs recommend that newline='' be used to open the file. Maybe 'csv' could check that and warn if its not the case. I poked around but it seems like io files

[issue34801] codecs.getreader() splits lines containing control characters

2018-10-04 Thread Neil Schemenauer
Neil Schemenauer added the comment: Thank you for the research. The problem is indeed that \v is getting treated as a line separator. That is an intentional design choice, see: https://bugs.python.org/issue12855 It would seem to have some surprising implications for CSV parsing. E.g. if

[issue34801] codecs.getreader() splits lines containing control characters

2018-10-04 Thread Karthikeyan Singaravelan
Karthikeyan Singaravelan added the comment: codecs.getreader('utf-8')(open('test.txt', 'rb')) during iteration str.splitlines on the decoded data that takes '\x0b' as a valid newline as specified in [0] being a superset of universal newlines. Thus splits on '\x0b' as a valid newline for

[issue34801] codecs.getreader() splits lines containing control characters

2018-09-25 Thread Neil Schemenauer
New submission from Neil Schemenauer : This seems to be a bug in codecs.getreader(). io.TextIOWrapper(fp, encoding) works correctly. -- files: codecs_bug.py messages: 326382 nosy: nascheme priority: low severity: normal status: open title: codecs.getreader() splits lines containing