[issue22232] str.splitlines splitting on non-\r\n characters

Marc-Andre Lemburg Fri, 05 Oct 2018 01:18:02 -0700


Marc-Andre Lemburg <m...@egenix.com> added the comment:


I am -1 on changing the default behavior. The Unicode standard defines what a 
linebreak code point is (all code points with character properties Zl or 
bidirectional property B) and we adhere to that. This may confuse parsers 
coming from the ASCII world, but that's really a problem with those parsers 
assuming that .splitlines() only splits on ASCII line breaks, i.e. they are not 
written in a Unicode compatible way.

As mentioned in https://bugs.python.org/issue18291 we could add a parameter to 
.splitlines(), but this would render the method not much faster than re.split().

Using re.split() is not a work-around in his case, it's an explicit form  of 
defining the character you want to split lines on, if the standards defining 
your file format as only accepting ASCII line break characters.

Since there are many such file formats, perhaps adding a parameter 
asciionly=True/False would make sense. .splitlines() could then be made to only 
split on ASCII linebreak characters. This new parameter would then have to 
default to False to maintain compatibility with Unicode and all previous 
releases.

----------
nosy: +lemburg

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue22232>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22232] str.splitlines splitting on non-\r\n characters

Reply via email to