Nick Coghlan wrote: > Antoine Pitrou wrote: >> M.-A. Lemburg <mal <at> egenix.com> writes: >>> Please file a bug report for this. f.readlines() (or rather >>> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) >>> for detecting line break characters. >> >> Actually, no. It has been designed from the start to only recognize the >> "standard" line break representations found in common formats/protocols (CR, >> LF >> and CR+LF). >> People wanting to split on arbitrary unicode line breaks should use >> str.splitlines(). > > The fairly long-standing RFE relating to an arbitrarily selectable > newline separator seems relevant here: > http://bugs.python.org/issue1152248 > > As with the discussion there, the problem with using str.splitlines is > that it prevents pipelining approaches that avoid reading a whole file > into memory. > > While removing the validity check from readlines() completely is > questionable (the readrecords() approach mentioned in the tracker issue > would still be better there), loosening the validity check to be based > on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it > a feature requests rather than a bug though).
I've had a look at the io implementation: this appears to be based on the universal newline support idea which addresses only a fixed set of "new line" character combinations and is not as straight forward to extend to support all Unicode line break characters as I thought. What I don't understand is why the io layer tries to reinvent the wheel here instead of just using the codec's .readline() method - which *does* use .splitlines() and has full support for all Unicode line break characters (including the CRLF combination). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com