Neil Schemenauer added the comment:
I too would prefer a new method name rather than overloading splitlines() with
more keyword args (passed as hardcoded constants, usually). Again, I think we
want:
list(open(..).read().()) == list(open(..))
readlines() returns a list but I think this
Steven D'Aprano added the comment:
I don't like the idea of adding a second bool parameter to splitlines. Guido
has a rough rule of thumb (which I agree with) of "no constant bool
parameters". If people will typically call a function with some sort of "mode"
parameter using a hard-coded
Neil Schemenauer added the comment:
> Why not simply add a new parameter, to make people who want ASCII linebreaks
> continue to use .splitlines() ?
That could work but I think in nearly every case you don't want to use
splitlines() without supplying the parameter. So, it seems like a bit
Marc-Andre Lemburg added the comment:
Why not simply add a new parameter, to make people who want
ASCII linebreaks continue to use .splitlines() ?
It think it would be less than ideal to have one method break on
all Unicode line breaks and another only on ASCII ones.
--
Neil Schemenauer added the comment:
I've created a topic on this inside the "Ideas" area of discuss.python.org.
Sorry if that wasn't appropriate, not sure if I should have keep the discussion
here.
Inada Naoki suggests creating a new method str.iterlines{[keepends]). Given
that people
Marc-Andre Lemburg added the comment:
I am -1 on changing the default behavior. The Unicode standard defines what a
linebreak code point is (all code points with character properties Zl or
bidirectional property B) and we adhere to that. This may confuse parsers
coming from the ASCII world,
Serhiy Storchaka added the comment:
If change the default behavior we need to wait several releases after adding
this option. Users should be able to pick the current behavior explicitly.
Currently the workaround is using regular expressions.
For s.splitlines(keepends=False):
Neil Schemenauer added the comment:
If we introduce a keyword parameter, I think the default of str.splitlines()
should be changed to match bytes.splitlines (and match Python 2
str.splitlines()). I.e. split on \r and \n by default. I looked through the
stdline and I can't find any calls
Alexander Schrijver added the comment:
I appeared to have missed the reference to that issue when I read this issue
the first time. Re-opening that issue makes sense to me.
--
___
Python tracker
Martin Panter added the comment:
For Python 3, the bytes.splitlines() and bytearray.splitlines() documentation
has been moved to a separate section out (Issue 21777). I don’t think it is
good to add much detail of bytes.splitlines() in the str.splitlines()
documentation.
For Python 2,
Alexander Schrijver added the comment:
Oops, wrong diff. Sorry, this is the correct one for 2.7.
--
Added file: http://bugs.python.org/file43075/cpython2.7_splitlines.diff
___
Python tracker
Alexander Schrijver added the comment:
This diff synchronizes the cpython 2.7 with that from 3.5 and also describes
the difference between bytes objects and unicode objects (from the other diff)
--
Added file: http://bugs.python.org/file43072/cpython3.5_splitlines.diff
Alexander Schrijver added the comment:
This diff updates the cpython (tip) documentation to document the different
behaviour when using splitlines on bytes objects or string objects.
--
keywords: +patch
nosy: +Alexander Schrijver
Added file:
Martin Panter added the comment:
The main documentation has been updated and Issue 12855 has been closed. What
is left to do here, considering this is marked as a documenation bug? Just
modify the doc strings, as Terry suggested in
https://bugs.python.org/issue22232#msg225766?
--
Gregory P. Smith added the comment:
If this isn't already mentioned in 2 to 3 porting notes it is worth
highlighting there. code which uses a str in python 2 and still uses a str in
python 3 is now splitting on many more characters.
That seems to be the source of bugs like issue22233.
Changes by Martin Panter vadmium...@gmail.com:
--
nosy: +vadmium
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
Python-bugs-list
Ezio Melotti added the comment:
Looks like str.splitlines is using STRINGLIB_ISLINEBREAK which in turn uses
Py_UNICODE_ISLINEBREAK, so the behavior should be correct. If splitting on \n,
\r, and \r\n only is common enough with might add a bool arg to splitlines to
restrict the splitting on
Serhiy Storchaka added the comment:
With Terry's explanation linebreak looks better to me. Yet one alternative is
ascii=False (or unicode=True?). And may be worth to add this parameter to
strip/rstrip/lstrip/split too. On other hand regular expressions can be used in
such special cases.
Ezio Melotti added the comment:
There are some ascii line breaks other than \n, \r, \r\n.
unicode=True might be better, but might be confused with unicode strings.
Maybe unicode_linebreaks or unicode_newlines?
--
___
Python tracker
Serhiy Storchaka added the comment:
See also issue18236.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
Python-bugs-list mailing
Changes by Jakub Wilk jw...@jwilk.net:
--
nosy: +jwilk
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
Python-bugs-list mailing list
R. David Murray added the comment:
The existing related open doc issue issue 12855.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
R. David Murray added the comment:
Ideally str.splitlines would split on whatever the unicode database says are
mandatory line break characters. I take it this is currently not true? That
is, that the list is hardcoded?
--
___
Python tracker
Samuel Charron added the comment:
This is a known issue, and will be resolved by improving documentation, I'm
closing this bug
Thanks !
--
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
R. David Murray added the comment:
OK, we'll use issue 22232 to resolve the issue of email using splitlines.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
Serhiy Storchaka added the comment:
May be add a parameter to str.splitlines() which will switch behavior to split
on '\n', '\r' and '\r\n' only?
--
nosy: +serhiy.storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
Terry J. Reedy added the comment:
Unless there is already another issue for improving the doc, this should at
least be left open as a doc issue.
But I had the same thought as Serhiy, that we should at least optionally make
the current doc correct. Two possibilities:
newlines=False If true,
Serhiy Storchaka added the comment:
I don't understand why you say about latin-1. splitlines() supports linebreaks
outside latin-1 range.
[hex(i) for i in range(sys.maxunicode + 1) if len(('%cx' % i).splitlines())
== 2]
['0xa', '0xb', '0xc', '0xd', '0x1c', '0x1d', '0x1e', '0x85', '0x2028',
Terry J. Reedy added the comment:
I was not aware of the remainder of the undocumented behavior. Thanks for the
code that makes it clear .
linebreak (or linebreaks)=True means that splitting occurs on some
(approximation?*) of unicode mandatory linebreaks, as opposed to just the ascii
Roundup Robot added the comment:
New changeset 3ad59ed0f4f0 by Terry Jan Reedy in branch '3.4':
Issue #22232 (partial fix): update Universal newlines Glossary entry.
http://hg.python.org/cpython/rev/3ad59ed0f4f0
--
nosy: +python-dev
___
Python
Terry J. Reedy added the comment:
Glossary fixed. I changed the components to Documention as you will handle
email elsewhere.
For library references: The key sentence currently used in all entries is This
method uses the universal newlines approach to splitting lines., where
*universal
Terry J. Reedy added the comment:
Objects/unicodeobject.c linebreak is at 266. With 3.4.1:
'a\x0ab\x0bc\x0cd\x0d1c\x1c1d\x1d1e\x1e'.splitlines()
['a', 'b', 'c', 'd', '1c', '1d', '1e']
\x0a == \n, \x0d == \r
The \r\n pair is a special case, as promised, but other pairs are not.
Samuel Charron added the comment:
It's also at line #14941 for unicode strings if I understand correctly
With 3.4.0:
a\x85b\x1ec.splitlines()
['a', 'b', 'c']
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
R. David Murray added the comment:
See issue 7643 for some technical background. There are some other interesting
issues to read if you seach the tracker for 'splitlines unicode', one of which
is an open doc issue. Clearly the docs about this are inadequate.
Basically, though, I think you
34 matches
Mail list logo