[issue18291] codecs.open interprets FS, RS, GS as line ends

Neil Schemenauer Thu, 04 Oct 2018 17:20:46 -0700


Neil Schemenauer <nas-pyt...@arctrix.com> added the comment:


Attached is a rough patch that tries to fix this problem.  I changed the 
behavior in that unicode char 0x2028 is no longer treated as a line separator.  
It would be trival to change the regex to support that too, if we want to 
preserve backwards compatibility.  Personally, I think readlines() on a codecs 
reader should do that same line splitting as an 'io' file.

If we want to use the patch, the following must yet be done: write tests that 
check the splitting on FS, RS, and GS characters.  Write a news entry.  I 
didn't do any profiling to see what the performance effect of my change is so 
that should be checked too.

----------
Added file: https://bugs.python.org/file47851/codecs_splitlines.txt

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue18291>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18291] codecs.open interprets FS, RS, GS as line ends

Reply via email to