[issue6664] readlines should understand Line Separator and Paragraph Separator characters
Antoine Pitrou pit...@free.fr added the comment: By design, readlines() only recognizes those characters which are official line separators on various OSes (\n, \r, \r\n). This is important for proper parsing of log files, internet protocols, etc. If you want to split on all line separators recognized by the unicode spec, use str.splitlines(). -- resolution: - rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6664 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6664] readlines should understand Line Separator and Paragraph Separator characters
Jeffrey Finkelstein jeffrey.finkelst...@gmail.com added the comment: This seems to be because codecs.StreamReader.readlines() function does this: def readlines(self, sizehint=None, keepends=True): data = self.read() return data.splitlines(keepends) But the io readlines() functions make multiple calls to readline() instead. Here is the test case which passes on the codecs readlines() but fails on the io readlines(). -- keywords: +patch nosy: +jfinkels Added file: http://bugs.python.org/file19086/issue6664.testcase.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6664 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6664] readlines should understand Line Separator and Paragraph Separator characters
Changes by Mark Lawrence breamore...@yahoo.co.uk: -- nosy: +benjamin.peterson, pitrou stage: - needs patch type: - behavior versions: +Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6664 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6664] readlines should understand Line Separator and Paragraph Separator characters
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo, lemburg versions: -Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6664 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6664] readlines should understand Line Separator and Paragraph Separator characters
New submission from Neil Hodgson nyamaton...@users.sourceforge.net: Unicode includes Line Separator U+2028 and Paragraph Separator U+2029 line ending characters. The readlines method of the file object returned by the built-in open does not treat these characters as line ends although the object returned by codecs.open(..., encoding='utf-8') does. The attached program creates a UTF-8 file containing three lines with the second line ended with a Paragraph Separator. The program then reads this file back in as a text file. Only two lines are seen when reading the file back in. The desired behaviour is for the file to be read in as three lines. -- components: IO files: lineends.py messages: 91397 nosy: nyamatongwe severity: normal status: open title: readlines should understand Line Separator and Paragraph Separator characters versions: Python 3.1 Added file: http://bugs.python.org/file14671/lineends.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6664 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com