[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-23 Thread Samuel Charron

Samuel Charron added the comment:

This is a known issue, and will be resolved by improving documentation, I'm 
closing this bug

Thanks !

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22232] str.splitlines splitting on non-\r\n characters

2014-08-22 Thread Samuel Charron

Samuel Charron added the comment:

It's also at line #14941 for unicode strings if I understand correctly

With 3.4.0: 

 a\x85b\x1ec.splitlines()
['a', 'b', 'c']

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22232] str.splitlines splitting on none-\r\n characters

2014-08-20 Thread Samuel Charron

New submission from Samuel Charron:

According to the documentation, str.splitlines uses the universal newlines to 
split lines.
The documentation says it's all about \r, \n, and \r\n 
(https://docs.python.org/3.5/glossary.html#term-universal-newlines)

However, it's also splitting on other characters. Reading the code, it seems 
the list of characters is from Objects/unicodeobject.c , in _PyUnicode_Init, 
the linebreak array.
When testing any of these characters, it splits the string.

Other libraries are using str.splitlines assuming it only breaks on these \r 
and \n characters. This is the case of email.feedparser for instance, used by 
http.client to parse headers. These HTTP headers should be separated by CLRF as 
specified by http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4. 

Either the documentation should state that splitlines splits on other 
characters or it should stick to the documentation and split only on \r and \n 
characters.

If it splits on other characters, the list could be improved, as the unicode 
reference lists the mandatory characters for line breaking : 
http://www.unicode.org/reports/tr14/tr14-32.html#BK

--
components: Library (Lib), Unicode
messages: 225561
nosy: ezio.melotti, haypo, scharron
priority: normal
severity: normal
status: open
title: str.splitlines splitting on none-\r\n characters
type: behavior
versions: Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22233] http.client splits headers on none-\r\n characters

2014-08-20 Thread Samuel Charron

New submission from Samuel Charron:

In some cases, the headers from http.client (that uses email.feedparser) splits 
headers at wrong separators. The bug is from the use of str.splitlines (in 
email.feedparser) that splits on other characters than \r\n as it should. (See 
bug http://bugs.python.org/issue22232)

To reproduce the bug : 

import http.client
c = http.client.HTTPSConnection(graph.facebook.com)
c.request(GET, /%C4%85, None, {test: \x85})
r = c.getresponse()
print(r.headers)
print(r.headers.keys())
print(r.headers.get(WWW-Authenticate))

As you can see, the WWW-Authenticate is wrongly parsed (it misses its final ), 
and therefore the rest of the headers are ignored.

--
components: Library (Lib)
messages: 225562
nosy: scharron
priority: normal
severity: normal
status: open
title: http.client splits headers on none-\r\n characters
type: behavior
versions: Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22233
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22232] str.splitlines splitting on none-\r\n characters

2014-08-20 Thread Samuel Charron

Samuel Charron added the comment:

For an example of a serious bug caused by this, see 
http://bugs.python.org/issue22233

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22232
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22233] http.client splits headers on none-\r\n characters

2014-08-20 Thread Samuel Charron

Samuel Charron added the comment:

A consequence of this bug is that r.read() blocks until a timeout occurs since 
the content-length header is not interpreted (I think this is related to the 
HTTPResponse.__init__ comment)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22233
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com