Why not using str type for str and str subtypes, and bytes type for bytes and bytes-like object (bytearray, memoryview)? I don't think that we need an ABC here.
Victor Le 16 nov. 2013 10:44, "Nick Coghlan" <ncogh...@gmail.com> a écrit : > On 16 Nov 2013 10:47, "Victor Stinner" <victor.stin...@gmail.com> wrote: > > > > 2013/11/16 Nick Coghlan <ncogh...@gmail.com>: > > > To address Serhiy's security concerns with the compression codecs > (which are > > > technically independent of the question of restoring the aliases), I > also > > > plan to document how to systematically blacklist particular codecs in > an > > > application by setting attributes on the encodings module and/or > appropriate > > > entries in sys.modules. > > > > I would be simpler and safer to blacklist bytes=>bytes and str=>str > > codecs from bytes.decode() and str.encode() directly. Marc Andre > > Lemburg proposed to add new attributes in CodecInfo to specify input > > and output types. > > Yes, but that type compatibility introspection is a change for 3.5 at > the earliest (although I commented on > http://bugs.python.org/issue19619 with two alternate suggestions that > I think would be reasonable to implement for 3.4). > > Everything codec related that I am doing at the moment is about > improving the state of 3.4 and source compatible 2/3 code. Proposals > for further 3.5+ only improvements are relevant only in the sense that > I don't want to lock us out from future improvements (which is why my > main aim is to clarify the status quo, with the only functional > changes related to restoring feature parity with Python 2 for > non-Unicode codecs). > > > > The only functional *change* I'd still like to make for 3.4 is to > restore > > > the shorthand aliases for the non-Unicode codecs (to ease the > migration for > > > folks coming from Python 2), but this thread has convinced me I likely > need > > > to write the PEP *before* doing that, and I still have to integrate > > > ensurepip into pyvenv before the beta 1 deadline. > > > > > > So unless you and Victor are prepared to +1 the restoration of the > codec > > > aliases (closing issue 7475) in anticipation of that codecs > infrastructure > > > documentation PEP, the change to restore the aliases probably won't be > in > > > 3.4. (I *might* get the PEP written in time regardless, but I'm not > betting > > > on it at this point). > > > > Using StackOverflow search engine, I found some posts where people > > asks for "hex" codec on Python 3. There are two answers: use binascii > > module or use codecs.encode(). So even if codecs.encode() was never > > documented, it looks like it is used. So I now agree that documenting > > it would not make the situation worse. > > Aye, that was my conclusion (hence my proposal on issue 7475 back in > April). > > Can I take that observation as a +1 for restoring the aliases as well? > (That and more efficiently rejecting the non-Unicode codecs from > str.encode, bytes.decode and bytearray.decode are the only aspects of > this subject to the beta 1 deadline - we can be a bit more leisurely > when it comes to working out the details of the docs updates) > > > Adding transform()/untransform() method to bytes and str is a non > > trivial change and not everybody likes them. Anyway, it's too late for > > Python 3.4. > > > > In my opinion, the best option is to add new input_type/output_type > > attributes to CodecInfo right now, and modify the codecs so > > "abc".encode("hex") raises a LookupError (instead of tricky error > > message with some evil low-level hacks on the traceback and the > > exception, which is my initial concern in this mail thread). It fixes > > also the security vulnerability. > > The C level code for catching the input type errors only looks evil > because: > > - the C level equivalent of "exception Exception as Y: raise X from Y" > is just plain ugly in the first place > - the chaining includes a *lot* of checks of the original exception to > ensure that no data is lost by raising a new instance of the same > exception Type and chaining > - it chains ValueError, AttributeError and any other currently > stateless (aside from a str description) error the codec might throw, > not just input type validation errors (it deliberately doesn't chain > stateful errors as doing so might be backwards incompatible with > existing error handling). > > However, the ugliness of that code is the reason I'm intrigued by the > possibility of traceback annotations as a potentially cleaner solution > than trying to seamlessly wrap exceptions with a new one that adds > more context information. While I think the gain in codec > debuggability is worth it in this case, my concern over the complexity > and the current limitations are the reason I didn't make it a public C > API. > > > To keep backward compatibility (even with custom codecs registered > > manually), if input_type/output_type is not defined, we should > > consider that the codec is a classical text encoding (encode > > str=>bytes, decode bytes=>str). > > Without an already existing ByteSequence ABC , it isn't feasible to > propose and implement this completely in the 3.4 time frame (since you > would need such an ABC to express the input type accepted by our > Unicode and binary codecs - the only one that wouldn't need it is > rot_13, since that's str->str). > > However, the output types could be expressed solely as concrete types, > and that's all we need for the blacklist (since we could replace the > current instance check on the result with a subclass check on the > specified output type (if any) prior to decoding. > > Cheers, > Nick. >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com