[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-06-16 Thread STINNER Victor

Changes by STINNER Victor :


--
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-06-16 Thread STINNER Victor

STINNER Victor added the comment:


New changeset 272d888c7b58aff5e1614e3b12e8198b92054835 by Victor Stinner in 
branch 'master':
bpo-29783: Replace codecs.open() with io.open() (#599)
https://github.com/python/cpython/commit/272d888c7b58aff5e1614e3b12e8198b92054835


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-27 Thread STINNER Victor

STINNER Victor added the comment:

I abandonned my pull request.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread Martin Panter

Martin Panter added the comment:

I agree that it would be better to hold off deprecating codecs.open until 
Python 2 is no longer supported. This deprecation also discussed in Issue 8796.

There is more to compatability than the missing attributes. The most obvious 
one to me is that the TextIOBase.read’s parameter (read size) seems to 
correspond best with the second parameter (chars) to StreamReader.read.

The rot-13 codec is not relevant to codecs.open, because rot-13 is a 
text-to-text codec. Codecs.open is documented as opening files in “binary” 
(i.e. bytes) mode. I don’t think this is a bug with the rot-13 StreamWriter.

The documentation gives the impression that bytes-to-bytes codecs should work, 
including stateful usage such as with codecs.open. So I would consider that 
usage legitimate. But see Issue 20132 about problems with various (mainly 
bytes–bytes) stateful codecs. Either the problems should be treated as bugs to 
be fixed, or the documentation should be clarified to say they represent 
missing features.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I think using codecs.open() with a non-text codecs is a legit use case, but is 
not the best way. I suggest:

1) Encourage of using io.open() rather than codecs.open() for text encodings. 
codecs.open() was recommended way since it worked in all Python versions, 
including <2.6, but now we can ignore this advantage.

2) Discourage of using non-text codecs.

3) Deprecate codecs.open() (may be after EOL for Python 2.7).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread STINNER Victor

STINNER Victor added the comment:

About the issue #12508, would it be possible to modify StreamReader to use an 
incremental decoder? Or my idea doesn't make sense?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com




[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy Storchaka added the comment:
> I agree that it is better to use directly the module implementing the codecs. 
> But old third-party code still can use non-text codecs.
>
> There should be good reasons for breaking backward compatibility. Wouldn't be 
> better to deprecate codecs.open()?

I'm not sure that I understood. Do you consider that using
codecs.open() with a non-text codecs is a "legit" use case or not?

I understood that you suggest a smoother transition using a
deprecation to give more time to developers to update their code.

But what do you want to deprecate? The whole codecs.open() function?
Or using non-text codecs with codecs.open()? Or using text codecs with
codecs.open()?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread STINNER Victor

STINNER Victor added the comment:

> The reason for the problem is the UTF-8 decoder (and other
> decoders) expecting an extension to the codec decoder API,
> which are not implemented in its StreamReader class (it simply
> uses the base class). It's not a problem of the base class, but
> that of the codec.
>
> And no: it doesn't have anything to do with codec.open()
> or the StreamReaderWriter class.

open("document.txt", encoding="utf-8") uses IncrementalDecoder of
encodings.utf_8. This object doesn't seem to have the discussed issue.

IMHO the issue is that StreamReader doesn't use an incremental
decoder. I don't see how it could support multibyte encodings and
error handlers without an incremental decoder.

I like TextIOWrapper design between it only handles codecs and text
buffering. Bytes buffering is done at lower-level in a different
object.

I'm not confortable to modify StreamReader because it combines
TextIOWrapper with BufferedReader and so is more complex.

>> I propose to modify codecs.open() to reuse the io module: call io.open() 
>> with newline=''. The io module is now battle-tested and handles well many 
>> corner cases of incremental codecs with multibyte encodings.
>
> -1. People who want to use the io module should use it directly.

When porting code to Python 3, many people chose to use codecs.open()
to get text files using a single code base for Python 2 and Python 3.
Once the code is ported, I don't expect that anyone will replace
codecs.open() with io.open(). You know, nobody cares of the technical
debt...

>> The next step would be to deprecate the codecs.StreamReaderWriter class and 
>> the codecs.open(). But my latest attempt to deprecate them was the PEP 400 
>> and it wasn't a full success, so I now prefer to move step by step :-)
>
> I'm still -1 on the deprecations in PEP 400. You are essentially
> suggesting to replace the complete codecs subsystem with the
> io module, but forgetting that all codecs use StreamWriter and
> StreamReader as base classes.

You can elaborate on "all codecs use StreamWriter and StreamReader as
base classes". Only codecs.open() uses StreamReader and StreamWriter,
no?

All codecs implement a StreamReader and StreamWriter class, but my
question is how use these classes?

> The codecs sub system has a clean design. If used correctly
> and maintained with more care, it works really well.

It seems like we lack such maintainer, since I wrote the PEP, many
issues are still open:

http://bugs.python.org/issue7262
http://bugs.python.org/issue8630
http://bugs.python.org/issue10344
http://bugs.python.org/issue12508
http://bugs.python.org/issue12512

See also issue #5445 (wontfix, whereas TextIOWrapper.writeslines()
uses "for line in lines") and issue #12513 (this one is not fair, io
also has the same bug: issue #12215 :-)).

> I'm tired of having to fight these fights every few years.
> Can't we just stop having them, please ?

The status quo is to do nothing, but as a consequence, bugs are still
not fixed yet, and users are still affected by these bugs :-( I'm
trying to find a solution.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I agree that it is better to use directly the module implementing the codecs. 
But old third-party code still can use non-text codecs.

There should be good reasons for breaking backward compatibility. Wouldn't be 
better to deprecate codecs.open()?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 10.03.2017 15:17, STINNER Victor wrote:
> 
> The codecs.StreamReaderWriter() class still has old unfixed issues like the 
> issue #12508 (open since 2011). This issue is even seen as a security 
> vulnerability by the owasp-pysec project:
> https://github.com/ebranca/owasp-pysec/wiki/Unicode-string-silently-truncated

The issue should be fixed. Patches welcome :-)

The reason for the problem is the UTF-8 decoder (and other
decoders) expecting an extension to the codec decoder API,
which are not implemented in its StreamReader class (it simply
uses the base class). It's not a problem of the base class, but
that of the codec.

And no: it doesn't have anything to do with codec.open()
or the StreamReaderWriter class.

> I propose to modify codecs.open() to reuse the io module: call io.open() with 
> newline=''. The io module is now battle-tested and handles well many corner 
> cases of incremental codecs with multibyte encodings.

-1. People who want to use the io module should use it directly.

> With this change, codecs.open() cannot be used with non-text encodings... but 
> I'm not sure that this feature ever worked in Python 3:
> 
> $ ./python -bb
> Python 3.7.0a0
 import codecs
 f = codecs.open('test', 'w', encoding='rot13')
 f.write('hello')
> TypeError: a bytes-like object is required, not 'str'
 f.write(b'hello')
> TypeError: a bytes-like object is required, not 'dict'

That's a bug in the rot13 codec, not a feature. codec.open()
works just find with 'hex' and 'base64'.

> The next step would be to deprecate the codecs.StreamReaderWriter class and 
> the codecs.open(). But my latest attempt to deprecate them was the PEP 400 
> and it wasn't a full success, so I now prefer to move step by step :-)

I'm still -1 on the deprecations in PEP 400. You are essentially
suggesting to replace the complete codecs subsystem with the
io module, but forgetting that all codecs use StreamWriter and
StreamReader as base classes.

StreamReaderWriter is just an amalgamation of the two
classes StreamReader and StreamWriter, nothing more. It's
a completely harmless class in the codecs.py.

The codecs sub system has a clean design. If used correctly
and maintained with more care, it works really well. Trying
to rip things out won't make it better. Fixing implementations,
where the appropriate care was not applied, is a much better
strategy.

I'm tired of having to fight these fights every few years.
Can't we just stop having them, please ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread STINNER Victor

STINNER Victor added the comment:

> codecs.open() works with bytes-to-bytes codecs.

Oh ok. What are non-text encodings? I found:

* base64
* bz2
* hex
* quopri
* rot13
* uu
* zlib

It seems like all these codecs can be used with codecs.open() to write bytes 
strings, except of rot13.

Last time I used these codecs was at least 5 years ago. When I ported code to 
Python 3, I used directly the module implementing the codecs (like base64 
module for base64 codec). It's easy to write code working on Python 2 and 
Python 3 when using directly the module.


> In additional the interface of StreamReaderWriter is not fully compatible 
> with the interface of io classes.

StreamReaderWriter has 3 addition attributes: reader, writer and stream. Do you 
expect that these attributes are used in the wild?

I expect that the most common uses of codecs.open() are to read or write text 
files.

The question is if it is easy to update the code for the rare cases using 
codecs.open() for other purposes. And if it is possible to write code working 
on Python 2.7 and 3.7.


> This would be compatible-breaking change.

It's deliberate, this issue breaks the backward compatibility.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +492

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +493

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

codecs.open() works with bytes-to-bytes codecs.

>>> f = codecs.open('test1', 'w', encoding='hex_codec')
>>> f.write(b'hello')
>>> f.close()
>>> open('test1', 'rb').read()
b'68656c6c6f'

In additional the interface of StreamReaderWriter is not fully compatible with 
the interface of io classes. This would be compatible-breaking change.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29783] Modify codecs.open() to use the io module instead of codecs.StreamReaderWriter()

2017-03-10 Thread STINNER Victor

New submission from STINNER Victor:

The codecs.StreamReaderWriter() class still has old unfixed issues like the 
issue #12508 (open since 2011). This issue is even seen as a security 
vulnerability by the owasp-pysec project:
https://github.com/ebranca/owasp-pysec/wiki/Unicode-string-silently-truncated

I propose to modify codecs.open() to reuse the io module: call io.open() with 
newline=''. The io module is now battle-tested and handles well many corner 
cases of incremental codecs with multibyte encodings.

With this change, codecs.open() cannot be used with non-text encodings... but 
I'm not sure that this feature ever worked in Python 3:

$ ./python -bb
Python 3.7.0a0
>>> import codecs
>>> f = codecs.open('test', 'w', encoding='rot13')
>>> f.write('hello')
TypeError: a bytes-like object is required, not 'str'
>>> f.write(b'hello')
TypeError: a bytes-like object is required, not 'dict'

The next step would be to deprecate the codecs.StreamReaderWriter class and the 
codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it 
wasn't a full success, so I now prefer to move step by step :-)

Attached PR:

* Modify codecs.open() to use io.open()
* Remove "; use codecs.open() to handle arbitrary codecs" from io.open() and 
_pyio.open() error messages
* Replace codecs.open() with open() at various places

--
components: Unicode
messages: 289362
nosy: ezio.melotti, haypo, lemburg, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Modify codecs.open() to use the io module instead of 
codecs.StreamReaderWriter()
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com