M.-A. Lemburg wrote:
On 2008-06-13 11:32, Walter Dörwald wrote:
M.-A. Lemburg wrote:
On 2008-06-12 16:59, Walter Dörwald wrote:
M.-A. Lemburg wrote:
.transform() and .untransform() use the codecs to apply same-type
conversions. They do apply type checks to make sure that the
codec does indeed return the same type.

E.g. text.transform('xml-escape') or data.transform('base64').

So what would a base64 codec do with the errors argument?

It could use it to e.g. try to recover as much data as possible
from broken input data.

Currently (in Py2.x), it raises an exception if you pass in anything
but "strict".

I think for transformations we don't need the full codec machinery:
 > ...

No need to invent another wheel :-) The codecs already exist for
Py2.x and can be used by the .encode()/.decode() methods in Py2.x
(where no type checks occur).

By using a new API we could get rid of old warts. For example: Why does the stateless encoder/decoder return how many input characters/bytes it has consumed? It must consume *all* bytes anyway!

No, it doesn't and that's the point in having those return values :-)

Even though the encoder/decoders are stateless, that doesn't mean
they have to consume all input data. The caller is responsible to
make sure that all input data was in fact consumed.

You could for example have a decoder that stops decoding after
having seen a block end indicator, e.g. a base64 line end or
XML closing element.

So how should the UTF-8 decoder know that it has to stop at a closing XML element?

The UTF-8 decoder doesn't support this, but you could write a codec
that applies this kind of detection, e.g. to not try to decode
partial UTF-8 byte sequences at the end of input, which would then
result in error.

Just because all codecs that ship with Python always try to decode
the complete input doesn't mean that the feature isn't being used.

I know of no other code that does. Do you have an example for this use.

I already gave you a few examples.

Maybe I was unclear, I meant real world examples, not hypothetical ones.

The interface was designed to allow for the above situations.

Then could we at least have a new codec method that does:

def statelesencode(self, input):
   (output, consumed) = self.encode(input)
   assert len(input) == consumed
   return output

You mean as method to the Codec class ?

No, I meant as a method for the CodecInfo clas.

Sure, we could do that, but please use a different name,
e.g. .encodeall() and .decodeall() - .encode() and .decode()
are already stateles (and so would the new methods be), so
"stateless" isn't all that meaningful in this context.

I like the names encodeall/decodeall!

We could also add such a check to the PyCodec_Encode() and _Decode()
functions. They currently do not apply the above check.

In Python, those two functions are exposed as codecs.encode()
and codecs.decode().

This change will probably have to wait for the 2.7 cycle.

Servus,
   Walter
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to