Walter Dörwald wrote: > M.-A. Lemburg wrote: >> Walter Dörwald wrote: >>>>>> I'd suggest we keep codecs.lookup() the way it is and >>>>>> instead add new functions to the codecs module, e.g. >>>>>> codecs.getencoderobject() and codecs.getdecoderobject(). >>>>>> >>>>>> Changing the codec registration is not much of a problem: >>>>>> we could simply allow 6-tuples to be passed into the >>>>>> registry. >>>>> OK, so codecs.lookup() returns 4-tuples, but the registry stores 6-tuples >>>>> and the search functions must return 6-tuples. >>>>> And we add codecs.getencoderobject() and codecs.getdecoderobject() as >>>>> well as new classes codecs.StatefulEncoder and >>>>> codecs.StatefulDecoder. What about old search functions that return >>>>> 4-tuples? >>>> The registry should then simply set the missing entries to None and the >>>> getencoderobject()/getdecoderobject() would then >>>> have >>>> to raise an error. >>> Sounds simple enough and we don't loose backwards compatibility. >>> >>>> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?! >>> +1, but I'd like to have a replacement for this, i.e. a function that >>> returns all info the registry has about an encoding: >>> >>> 1. Name >>> 2. Encoder function >>> 3. Decoder function >>> 4. Stateful encoder factory >>> 5. Stateful decoder factory >>> 6. Stream writer factory >>> 7. Stream reader factory >>> >>> and if this is an object with attributes, we won't have any problems if we >>> extend it in the future. >> Shouldn't be a problem: just expose the registry dictionary >> via the _codecs module. >> >> The rest can then be done in a Python function defined in >> codecs.py using a CodecInfo class. > > This would require the Python code to call codecs.lookup() and then look into > the codecs dictionary (normalizing the encoding > name again). Maybe we should make a version of __PyCodec_Lookup() that allows > 4- and 6-tuples available to Python and use that? > The official PyCodec_Lookup() would then have to downgrade the 6-tuples to > 4-tuples.
Hmm, you're right: the dictionary may not have the requested codec info yet (it's only used as cache) and only a call to _PyCodec_Lookup() would fill it. >>> BTW, if we change the API, can we fix the return value of the stateless >>> functions? As the stateless function always >>> encodes/decodes the complete string, returning the length of the string >>> doesn't make sense. >>> codecs.getencoder() and codecs.getdecoder() would have to continue to >>> return the old variant of the functions, but >>> codecs.getinfo("latin-1").encoder would be the new encoding function. >> No: you can still write stateless encoders or decoders that do >> not process the whole input string. Just because we don't have >> any of those in Python, doesn't mean that they can't be written >> and used. A stateless codec might want to leave the work >> of buffering bytes at the end of the input data which cannot >> be processed to the caller. > > But what would the call do with that info? It can't retry encoding/decoding > the rejected input, because the state of the codec > has been thrown away already. This depends a lot on the nature of the codec. It may well be possible to work on chunks of input data in a stateless way, e.g. say you have a string of 4-byte hex values, then the decode function would be able to work on 4 bytes each and let the caller buffer any remaining bytes for the next call. There'd be no need for keeping state in the decoder function. >> It is also possible to write >> stateful codecs on top of such stateless encoding and decoding >> functions. > > That's what the codec helper functions from Python/_codecs.c are for. I'm not sure what you mean here. > Anyway, I've started implementing a patch that just adds > codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig, UTF-16, > UTF-16-LE and UTF-16-BE are already working. Nice :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 18 2006) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com