[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-07 Thread Neil Schemenauer
Neil Schemenauer added the comment: I too would prefer a new method name rather than overloading splitlines() with more keyword args (passed as hardcoded constants, usually). Again, I think we want: list(open(..).read().()) == list(open(..)) readlines() returns a list but I think this

[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-07 Thread Steven D'Aprano
Steven D'Aprano added the comment: I don't like the idea of adding a second bool parameter to splitlines. Guido has a rough rule of thumb (which I agree with) of "no constant bool parameters". If people will typically call a function with some sort of "mode" parameter using a hard-coded

[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Neil Schemenauer
Neil Schemenauer added the comment: > Why not simply add a new parameter, to make people who want ASCII linebreaks > continue to use .splitlines() ? That could work but I think in nearly every case you don't want to use splitlines() without supplying the parameter. So, it seems like a bit

[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Why not simply add a new parameter, to make people who want ASCII linebreaks continue to use .splitlines() ? It think it would be less than ideal to have one method break on all Unicode line breaks and another only on ASCII ones. --

[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Neil Schemenauer
Neil Schemenauer added the comment: I've created a topic on this inside the "Ideas" area of discuss.python.org. Sorry if that wasn't appropriate, not sure if I should have keep the discussion here. Inada Naoki suggests creating a new method str.iterlines{[keepends]). Given that people

[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: I am -1 on changing the default behavior. The Unicode standard defines what a linebreak code point is (all code points with character properties Zl or bidirectional property B) and we adhere to that. This may confuse parsers coming from the ASCII world,

[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: If change the default behavior we need to wait several releases after adding this option. Users should be able to pick the current behavior explicitly. Currently the workaround is using regular expressions. For s.splitlines(keepends=False):

[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Neil Schemenauer
Neil Schemenauer added the comment: If we introduce a keyword parameter, I think the default of str.splitlines() should be changed to match bytes.splitlines (and match Python 2 str.splitlines()). I.e. split on \r and \n by default. I looked through the stdline and I can't find any calls

[issue22232] str.splitlines splitting on non-\r\n characters

2016-06-01 Thread Alexander Schrijver
Alexander Schrijver added the comment: I appeared to have missed the reference to that issue when I read this issue the first time. Re-opening that issue makes sense to me. -- ___ Python tracker

[issue22232] str.splitlines splitting on non-\r\n characters

2016-05-31 Thread Martin Panter
Martin Panter added the comment: For Python 3, the bytes.splitlines() and bytearray.splitlines() documentation has been moved to a separate section out (Issue 21777). I don’t think it is good to add much detail of bytes.splitlines() in the str.splitlines() documentation. For Python 2,

[issue22232] str.splitlines splitting on non-\r\n characters

2016-05-31 Thread Alexander Schrijver
Alexander Schrijver added the comment: Oops, wrong diff. Sorry, this is the correct one for 2.7. -- Added file: http://bugs.python.org/file43075/cpython2.7_splitlines.diff ___ Python tracker

[issue22232] str.splitlines splitting on non-\r\n characters

2016-05-31 Thread Alexander Schrijver
Alexander Schrijver added the comment: This diff synchronizes the cpython 2.7 with that from 3.5 and also describes the difference between bytes objects and unicode objects (from the other diff) -- Added file: http://bugs.python.org/file43072/cpython3.5_splitlines.diff

[issue22232] str.splitlines splitting on non-\r\n characters

2016-05-31 Thread Alexander Schrijver
Alexander Schrijver added the comment: This diff updates the cpython (tip) documentation to document the different behaviour when using splitlines on bytes objects or string objects. -- keywords: +patch nosy: +Alexander Schrijver Added file:

[issue22232] str.splitlines splitting on non-\r\n characters

2015-07-10 Thread Martin Panter
Martin Panter added the comment: The main documentation has been updated and Issue 12855 has been closed. What is left to do here, considering this is marked as a documenation bug? Just modify the doc strings, as Terry suggested in https://bugs.python.org/issue22232#msg225766? --

[issue22232] str.splitlines splitting on non-\r\n characters

2015-07-10 Thread Gregory P. Smith
Gregory P. Smith added the comment: If this isn't already mentioned in 2 to 3 porting notes it is worth highlighting there. code which uses a str in python 2 and still uses a str in python 3 is now splitting on many more characters. That seems to be the source of bugs like issue22233.

[issue22232] str.splitlines splitting on non-\r\n characters

2015-03-17 Thread Martin Panter
Changes by Martin Panter vadmium...@gmail.com: -- nosy: +vadmium ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232 ___ ___ Python-bugs-list

[issue22232] str.splitlines splitting on non-\r\n characters

2014-10-28 Thread Ezio Melotti
Ezio Melotti added the comment: Looks like str.splitlines is using STRINGLIB_ISLINEBREAK which in turn uses Py_UNICODE_ISLINEBREAK, so the behavior should be correct. If splitting on \n, \r, and \r\n only is common enough with might add a bool arg to splitlines to restrict the splitting on

[issue22232] str.splitlines splitting on non-\r\n characters

2014-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: With Terry's explanation linebreak looks better to me. Yet one alternative is ascii=False (or unicode=True?). And may be worth to add this parameter to strip/rstrip/lstrip/split too. On other hand regular expressions can be used in such special cases.

[issue22232] str.splitlines splitting on non-\r\n characters

2014-10-28 Thread Ezio Melotti
Ezio Melotti added the comment: There are some ascii line breaks other than \n, \r, \r\n. unicode=True might be better, but might be confused with unicode strings. Maybe unicode_linebreaks or unicode_newlines? -- ___ Python tracker

[issue22232] str.splitlines splitting on non-\r\n characters

2014-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: See also issue18236. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232 ___ ___ Python-bugs-list mailing

[issue22232] str.splitlines splitting on non-\r\n characters

2014-10-28 Thread Jakub Wilk
Changes by Jakub Wilk jw...@jwilk.net: -- nosy: +jwilk ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232 ___ ___ Python-bugs-list mailing list

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-25 Thread R. David Murray
R. David Murray added the comment: The existing related open doc issue issue 12855. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232 ___ ___

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-25 Thread R. David Murray
R. David Murray added the comment: Ideally str.splitlines would split on whatever the unicode database says are mandatory line break characters. I take it this is currently not true? That is, that the list is hardcoded? -- ___ Python tracker

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Samuel Charron
Samuel Charron added the comment: This is a known issue, and will be resolved by improving documentation, I'm closing this bug Thanks ! -- status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread R. David Murray
R. David Murray added the comment: OK, we'll use issue 22232 to resolve the issue of email using splitlines. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232 ___

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: May be add a parameter to str.splitlines() which will switch behavior to split on '\n', '\r' and '\r\n' only? -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Terry J. Reedy
Terry J. Reedy added the comment: Unless there is already another issue for improving the doc, this should at least be left open as a doc issue. But I had the same thought as Serhiy, that we should at least optionally make the current doc correct. Two possibilities: newlines=False If true,

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I don't understand why you say about latin-1. splitlines() supports linebreaks outside latin-1 range. [hex(i) for i in range(sys.maxunicode + 1) if len(('%cx' % i).splitlines()) == 2] ['0xa', '0xb', '0xc', '0xd', '0x1c', '0x1d', '0x1e', '0x85', '0x2028',

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Terry J. Reedy
Terry J. Reedy added the comment: I was not aware of the remainder of the undocumented behavior. Thanks for the code that makes it clear . linebreak (or linebreaks)=True means that splitting occurs on some (approximation?*) of unicode mandatory linebreaks, as opposed to just the ascii

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Roundup Robot
Roundup Robot added the comment: New changeset 3ad59ed0f4f0 by Terry Jan Reedy in branch '3.4': Issue #22232 (partial fix): update Universal newlines Glossary entry. http://hg.python.org/cpython/rev/3ad59ed0f4f0 -- nosy: +python-dev ___ Python

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Terry J. Reedy
Terry J. Reedy added the comment: Glossary fixed. I changed the components to Documention as you will handle email elsewhere. For library references: The key sentence currently used in all entries is This method uses the universal newlines approach to splitting lines., where *universal

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-22 Thread Terry J. Reedy
Terry J. Reedy added the comment: Objects/unicodeobject.c linebreak is at 266. With 3.4.1: 'a\x0ab\x0bc\x0cd\x0d1c\x1c1d\x1d1e\x1e'.splitlines() ['a', 'b', 'c', 'd', '1c', '1d', '1e'] \x0a == \n, \x0d == \r The \r\n pair is a special case, as promised, but other pairs are not.

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-22 Thread Samuel Charron
Samuel Charron added the comment: It's also at line #14941 for unicode strings if I understand correctly With 3.4.0: a\x85b\x1ec.splitlines() ['a', 'b', 'c'] -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22232

[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-22 Thread R. David Murray
R. David Murray added the comment: See issue 7643 for some technical background. There are some other interesting issues to read if you seach the tracker for 'splitlines unicode', one of which is an open doc issue. Clearly the docs about this are inadequate. Basically, though, I think you