[issue6664] readlines should understand Line Separator and Paragraph Separator characters

2011-01-03 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

By design, readlines() only recognizes those characters which are official line 
separators on various OSes (\n, \r, \r\n). This is important for proper parsing 
of log files, internet protocols, etc.
If you want to split on all line separators recognized by the unicode spec, use 
str.splitlines().

--
resolution:  - rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6664
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6664] readlines should understand Line Separator and Paragraph Separator characters

2010-10-01 Thread Jeffrey Finkelstein

Jeffrey Finkelstein jeffrey.finkelst...@gmail.com added the comment:

This seems to be because codecs.StreamReader.readlines() function does this:

def readlines(self, sizehint=None, keepends=True):
data = self.read()
return data.splitlines(keepends)

But the io readlines() functions make multiple calls to readline() instead.

Here is the test case which passes on the codecs readlines() but fails on the 
io readlines().

--
keywords: +patch
nosy: +jfinkels
Added file: http://bugs.python.org/file19086/issue6664.testcase.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6664
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6664] readlines should understand Line Separator and Paragraph Separator characters

2010-08-01 Thread Mark Lawrence

Changes by Mark Lawrence breamore...@yahoo.co.uk:


--
nosy: +benjamin.peterson, pitrou
stage:  - needs patch
type:  - behavior
versions: +Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6664
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6664] readlines should understand Line Separator and Paragraph Separator characters

2010-08-01 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +haypo, lemburg
versions:  -Python 2.7, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6664
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6664] readlines should understand Line Separator and Paragraph Separator characters

2009-08-07 Thread Neil Hodgson

New submission from Neil Hodgson nyamaton...@users.sourceforge.net:

Unicode includes Line Separator U+2028 and Paragraph Separator U+2029
line ending characters. The readlines method of the file object returned
by the built-in open does not treat these characters as line ends
although the object returned by codecs.open(..., encoding='utf-8') does.

The attached program creates a UTF-8 file containing three lines with
the second line ended with a Paragraph Separator. The program then reads
this file back in as a text file. Only two lines are seen when reading
the file back in.

The desired behaviour is for the file to be read in as three lines.

--
components: IO
files: lineends.py
messages: 91397
nosy: nyamatongwe
severity: normal
status: open
title: readlines should understand Line Separator and Paragraph Separator 
characters
versions: Python 3.1
Added file: http://bugs.python.org/file14671/lineends.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6664
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com