Re: [Tutor] String encoding

Jerry Hill Fri, 26 Aug 2011 08:34:16 -0700

On Thu, Aug 25, 2011 at 7:07 PM, Prasad, Ramit
<ramit.pra...@jpmorgan.com> wrote:
> Nice catch! Yeah, I am stuck on the encoding mechanism as well. I know how to 
> encode/decode...but not what encoding to use. Is there a reference that I can 
> look up to find what encoding that would correspond to? I know what the 
> character looks like if that helps. I know that Python does display the 
> correct character sometimes, but not sure when or why.


In this case, the encoding is almost certainly "latin-1".  I know that
from playing around at the interactive interpreter, like this:

>>> s = 'M\xc9XICO'
>>> print s.decode('latin-1')
MÉXICO

If you want to see charts of various encodings, wikipedia has a bunch.
 For instance, the Latin-1 encoding is here:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1 and UTF-8 is here:
http://en.wikipedia.org/wiki/Utf-8

As the other respondents have said, it's really hard to figure this
out just in code.  The chardet module mentioned by Steven D'Aprano is
probably the best bet if you really *have* to guess the encoding of an
arbitrary sequence of bytes, but it much, much better to actually know
the encoding of your inputs.

Good luck!

-- 
Jerry
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] String encoding

Reply via email to