newbie with a encoding question, please help

2010-04-01 Thread Mister Yu
hi experts, i m new to python, i m writing crawlers to extract data from some chinese websites, and i run into a encoding problem. i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4' which is encoded in gb2312, but i have no idea of how to convert it back to utf-8 to re-create

Re: newbie with a encoding question, please help

2010-04-01 Thread Chris Rebert
2010/4/1 Mister Yu eryan...@gmail.com: hi experts, i m new to python, i m writing crawlers to extract data from some chinese websites, and i run into a encoding problem. i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4' which is encoded in gb2312, No! Instances of type

Re: newbie with a encoding question, please help

2010-04-01 Thread Mister Yu
On Apr 1, 7:22 pm, Chris Rebert c...@rebertia.com wrote: 2010/4/1 Mister Yu eryan...@gmail.com: hi experts, i m new to python, i m writing crawlers to extract data from some chinese websites, and i run into a encoding problem. i have a unicode object, which looks like this

Re: newbie with a encoding question, please help

2010-04-01 Thread Chris Rebert
On Thu, Apr 1, 2010 at 4:38 AM, Mister Yu eryan...@gmail.com wrote: On Apr 1, 7:22 pm, Chris Rebert c...@rebertia.com wrote: 2010/4/1 Mister Yu eryan...@gmail.com: hi experts, i m new to python, i m writing crawlers to extract data from some chinese websites, and i run into a encoding

Re: newbie with a encoding question, please help

2010-04-01 Thread Stefan Behnel
Mister Yu, 01.04.2010 13:38: i m still not very sure how to convert a unicode object ** u'\xd6\xd0\xce\xc4 ** back to 中文 the string it supposed to be? You are confused. '\xd6\xd0\xce\xc4' is an encoded byte string, not a unicode string. The fact that you have it stored in a unicode string

Re: newbie with a encoding question, please help

2010-04-01 Thread Mister Yu
On Apr 1, 8:13 pm, Chris Rebert c...@rebertia.com wrote: On Thu, Apr 1, 2010 at 4:38 AM, Mister Yu eryan...@gmail.com wrote: On Apr 1, 7:22 pm, Chris Rebert c...@rebertia.com wrote: 2010/4/1 Mister Yu eryan...@gmail.com: hi experts, i m new to python, i m writing crawlers to extract

Re: newbie with a encoding question, please help

2010-04-01 Thread Stefan Behnel
Mister Yu, 01.04.2010 14:26: On Apr 1, 8:13 pm, Chris Rebert wrote: gb2312_bytes = ''.join([chr(ord(c)) for c in u'\xd6\xd0\xce\xc4']) unicode_string = gb2312_bytes.decode('gb2312') utf8_bytes = unicode_string.encode('utf-8') #as you wanted Simplifying this hack a bit: gb2312_bytes =

Re: newbie with a encoding question, please help

2010-04-01 Thread Mister Yu
On Apr 1, 9:31 pm, Stefan Behnel stefan...@behnel.de wrote: Mister Yu, 01.04.2010 14:26: On Apr 1, 8:13 pm, Chris Rebert wrote: gb2312_bytes = ''.join([chr(ord(c)) for c in u'\xd6\xd0\xce\xc4']) unicode_string = gb2312_bytes.decode('gb2312') utf8_bytes = unicode_string.encode('utf-8')