On Thu, Aug 25, 2011 at 7:07 PM, Prasad, Ramit
<[email protected]> wrote:
> Nice catch! Yeah, I am stuck on the encoding mechanism as well. I know how to
> encode/decode...but not what encoding to use. Is there a reference that I can
> look up to find what encoding that would correspond to? I know what the
> character looks like if that helps. I know that Python does display the
> correct character sometimes, but not sure when or why.
In this case, the encoding is almost certainly "latin-1". I know that
from playing around at the interactive interpreter, like this:
>>> s = 'M\xc9XICO'
>>> print s.decode('latin-1')
MÉXICO
If you want to see charts of various encodings, wikipedia has a bunch.
For instance, the Latin-1 encoding is here:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1 and UTF-8 is here:
http://en.wikipedia.org/wiki/Utf-8
As the other respondents have said, it's really hard to figure this
out just in code. The chardet module mentioned by Steven D'Aprano is
probably the best bet if you really *have* to guess the encoding of an
arbitrary sequence of bytes, but it much, much better to actually know
the encoding of your inputs.
Good luck!
--
Jerry
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor