On Thu, Aug 25, 2011 at 7:07 PM, Prasad, Ramit <ramit.pra...@jpmorgan.com> wrote: > Nice catch! Yeah, I am stuck on the encoding mechanism as well. I know how to > encode/decode...but not what encoding to use. Is there a reference that I can > look up to find what encoding that would correspond to? I know what the > character looks like if that helps. I know that Python does display the > correct character sometimes, but not sure when or why.
In this case, the encoding is almost certainly "latin-1". I know that from playing around at the interactive interpreter, like this: >>> s = 'M\xc9XICO' >>> print s.decode('latin-1') MÉXICO If you want to see charts of various encodings, wikipedia has a bunch. For instance, the Latin-1 encoding is here: http://en.wikipedia.org/wiki/ISO/IEC_8859-1 and UTF-8 is here: http://en.wikipedia.org/wiki/Utf-8 As the other respondents have said, it's really hard to figure this out just in code. The chardet module mentioned by Steven D'Aprano is probably the best bet if you really *have* to guess the encoding of an arbitrary sequence of bytes, but it much, much better to actually know the encoding of your inputs. Good luck! -- Jerry _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor