[issue18291] codecs.open interprets FS, RS, GS as line ends

Marc-Andre Lemburg Fri, 05 Oct 2018 01:07:22 -0700


Marc-Andre Lemburg <m...@egenix.com> added the comment:


The Unicode .splitlines() splits strings on what Unicode defines as linebreak 
characters (all code points with character properties Zl or bidirectional 
property B).

This is different than what typical CSV file parsers or other parsers built for 
the ASCII text files treat as newline. They usually only break on CR, CRLF, LF, 
so the use of .splitlines() in this context is wrong, not the method itself.

It may make sense extending .splitlines() to pass in a list of linebreak 
characters to break on, but that would make it a lot slower and the same can 
already be had by using re.split() on Unicode strings.

Closing this as won't fix.

----------
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue18291>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18291] codecs.open interprets FS, RS, GS as line ends

Reply via email to