Victor Stinner wrote: > Hi, > > In Python 2, codecs.open() is the best way to read and/or write files > using Unicode. But in Python 3, open() is preferred with its fast io > module. I would like to deprecate codecs.open() because it can be > replaced by open() and io.TextIOWrapper. I would like your opinion and > that's why I'm writing this email.
I think you should have moved this part of your email further up, since it explains the reason why this idea was rejected for now: > I opened an issue for this idea. Brett and Marc-Andree Lemburg don't > want to deprecate codecs.open() & friends because they want to be able > to write code working on Python 2 and on Python 3 without any change. I > don't think it's realistic: nontrivial programs require at least the six > module, and most likely the 2to3 program. The six module can have its > "codecs.open" function if codecs.open is removed from Python 3.4. And now for something completely different: > codecs.open() and StreamReader, StreamWriter and StreamReaderWriter > classes of the codecs module don't support universal newlines, still > have some issues with stateful codecs (like UTF-16/32 BOMs), and each > codec has to implement a StreamReader and a StreamWriter class. > > StreamReader and StreamWriter are stateless codecs (no reset() or > setstate() method), and so it's not possible to write a generic fix for > all child classes in the codecs module. Each stateful codec has to > handle special cases like seek() problems. For example, UTF-16 codec > duplicates some IncrementalEncoder/IncrementalDecoder code into its > StreamWriter/StreamReader class. Please read PEP 100 regarding StreamReader and StreamWriter. Those codecs parts were explicitly designed to be stateful, unlike the stateless encoder/decoder methods. Please read my reply on the ticket: """ StreamReader and StreamWriter classes provide the base codec implementations for stateful interaction with streams. They define the interface and provide a working implementation for those codecs that choose not to implement their own variants. Each codec can, however, implement variants which are optimized for the specific encoding or intercept certain stream methods to add functionality or improve the encoding/decoding performance. Both are essential parts of the codec interface. TextIOWrapper and StreamReaderWriter are merely wrappers around streams that make use of the codecs. They don't provide any codec logic themselves. That's the conceptual difference. """ > The io module is well tested, supports non-seekable streams, handles > correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of > newlines including an "universal newline" mode. TextIOWrapper reuses > incremental encoders and decoders, so BOM issues were fixed only once, > in TextIOWrapper. > > It's trivial to replace a call to codecs.open() by a call to open(), > because the two API are very close. The main different is that > codecs.open() doesn't support universal newline, so you have to use > open(..., newline='') to keep the same behaviour (keep newlines > unchanged). This task can be done by 2to3. But I suppose that most > people will be happy with the universal newline mode. > > I don't see which usecase is not covered by TextIOWrapper. But I know > some cases which are not supported by StreamReader/StreamWriter. This is a misunderstanding of the concepts behind the two. StreamReader and StreamWriters are implemented by the codecs, they are part of the API that each codec has to provide in order to register in the Python codecs system. Their purpose is to provide a stateful interface and work efficiently and directly on streams rather than buffers. Here's my reply from the ticket regarding using incremental encoders/decoders for the StreamReader/Writer parts of the codec set of APIs: """ The point about having them use incremental codecs for encoding and decoding is a good one and would need to be investigated. If possible, we could use incremental encoders/decoders for the standard StreamReader/Writer base classes or add new IncrementalStreamReader/Writer classes which then use the IncrementalEncode/Decoder per default. Please open a new ticket for this. """ > StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not > used in the Python 3 standard library. I tried removed them: except > tests of test_codecs which test them directly, the full test suite pass. > > Read the issue for more information: http://bugs.python.org/issue8796 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 24 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2011-06-20: EuroPython 2011, Florence, Italy 27 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com