Hi,

In Python 2, codecs.open() is the best way to read and/or write files
using Unicode. But in Python 3, open() is preferred with its fast io
module. I would like to deprecate codecs.open() because it can be
replaced by open() and io.TextIOWrapper. I would like your opinion and
that's why I'm writing this email.

--

codecs.open() and StreamReader, StreamWriter and StreamReaderWriter
classes of the codecs module don't support universal newlines, still
have some issues with stateful codecs (like UTF-16/32 BOMs), and each
codec has to implement a StreamReader and a StreamWriter class.

StreamReader and StreamWriter are stateless codecs (no reset() or
setstate() method), and so it's not possible to write a generic fix for
all child classes in the codecs module. Each stateful codec has to
handle special cases like seek() problems. For example, UTF-16 codec
duplicates some IncrementalEncoder/IncrementalDecoder code into its
StreamWriter/StreamReader class.

The io module is well tested, supports non-seekable streams, handles
correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of
newlines including an "universal newline" mode. TextIOWrapper reuses
incremental encoders and decoders, so BOM issues were fixed only once,
in TextIOWrapper.

It's trivial to replace a call to codecs.open() by a call to open(),
because the two API are very close. The main different is that
codecs.open() doesn't support universal newline, so you have to use
open(..., newline='') to keep the same behaviour (keep newlines
unchanged). This task can be done by 2to3. But I suppose that most
people will be happy with the universal newline mode.

I don't see which usecase is not covered by TextIOWrapper. But I know
some cases which are not supported by StreamReader/StreamWriter.

--

I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
want to deprecate codecs.open() & friends because they want to be able
to write code working on Python 2 and on Python 3 without any change. I
don't think it's realistic: nontrivial programs require at least the six
module, and most likely the 2to3 program. The six module can have its
"codecs.open" function if codecs.open is removed from Python 3.4.

StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not
used in the Python 3 standard library. I tried removed them: except
tests of test_codecs which test them directly, the full test suite pass.

Read the issue for more information: http://bugs.python.org/issue8796

Victor

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to