[issue22232] str.splitlines splitting on non-\r\n characters

Terry J. Reedy Sat, 23 Aug 2014 11:25:21 -0700

Terry J. Reedy added the comment:

Unless there is already another issue for improving the doc, this should at 
least be left open as a doc issue.


But I had the same thought as Serhiy, that we should at least optionally make 
the current doc correct. Two possibilities:

newlines=False  If true, only split on \r, \n, \r\n; otherwise split on all 
latin-1 linebreak characters -- <list>.  {This is rather awkward.}

linebreak=True  If true, split on all latin-1 linebreak characters <list>; 
otherwise only split on \r, \n, \r\n.  {Better, to me}

Changing both code and doc, at least in 3.5, says that both are wrong. If we 
agree on this, there is still the awkward issue of what to do for 3.4.  Just 
change the doc?  Then email must do something different in 3.4 to work around 
the code behavior. I think this may warrant a pydev discussion.

Another issue is whether latin-1 linebreaks are privileged.  Why not implement 
the full unicode linebreak algorithm.

An additional complication is that in 2.x, .splitlines acts as advertised.

>>> 'a\x0ab\x0bc\x0cd\x0dda\x0d\x0a1c\x1c1d\x1d1e\x1e85\x85end'.splitlines()
['a', 'b\x0bc\x0cd', 'da', '1c\x1c1d\x1d1e\x1e85\x85end']

----------
status: closed -> open

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue22232>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22232] str.splitlines splitting on non-\r\n characters

Reply via email to