M.-A. Lemburg wrote: > Walter Dörwald wrote: > >>I wonder if we should switch back to a simple readline() implementation >>for those codecs that don't require the current implementation >>(basically every charmap codec). > > That would be my preference as well. The 2.4 .readline() approach > is really only needed for codecs that have to deal with encodings > that: > > a) use multi-byte formats, or > b) support more line-end formats than just CR, CRLF, LF, or > c) are stateful. > > This can easily be had by using a mix-in class for > codecs which do need the buffered .readline() approach.
Should this be a mix-in or should we simply have two base classes? Which of those bases/mix-ins should be the default? >>AFAIK source files are opened in >>universal newline mode, so at least we'd get proper treatment of "\n", >>"\r" and "\r\n" line ends, but we'd loose u"\x1c", u"\x1d", u"\x1e", >>u"\x85", u"\u2028" and u"\u2029" (which are line terminators according >>to unicode.splitlines()). > > While the Unicode standard defines these characters as line > end code points, I think their definition does not necessarily > apply to data that is converted from a certain encoding to > Unicode, so that's not a big loss. > > E.g. in ASCII or Latin-1, FILE, GROUP and RECORD > SEPARATOR and NEXT LINE characters (0x1c, 0x1d, 0x1e, 0x85) > are not interpreted as line end characters. > > Furthermore, we had no reports of anyone complaining in > Python 1.6, 2.0 - 2.3 that line endings were not detected > properly. All these Python versions relied on the stream's > .readline() method to get the next line. The only bug reports > we had were for UTF-16 which falls into the above > category a) and did not support .readline() until Python 2.4. True. Bye, Walter Dörwald _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com