On Sun, 28 Jun 2009 21:25:13 +0000, Benjamin Peterson wrote:

>> > The email module is, yes, broken. You can recover the bytestrings of
>> > command-line arguments and environment variables.
>> 
>> 1. Does Python offer any assistance in doing so, or do you have to
>> manually convert the surrogates which are generated for unrecognised bytes?
> 
> fs_encoding = sys.getfilesystemencoding()
> bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]

This results in an internal error:

> "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

[FWIW, the error corresponds to _PyBytes_Resize, which has a
cautionary comment almost as large as the code.]

The documentation gives the impression that "surrogateescape" is only
meaningful for decoding.

>> 2. How do you do this for non-invertible encodings (e.g. ISO-2022)?
> 
> What's a non-invertible encoding? I can't find a reference to the term.

One where different inputs can produce the same output.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to