Re: [Tutor] accented characters to unaccented

Peter Otten Tue, 08 Jun 2010 01:01:42 -0700

KB SU wrote:

> Hi,
> 
> I have open url and read like following:
> 
> $import urllib
> $txt = urllib.urlopen("http://www.terme-catez.si";).read()
> $txt


> If you see above, in junk of HTLM, there is text like 'Terme
> \xc4\x8cate\xc5\xbe'  (original is 'Terme Čatež'). Now, I want to convert
> code like '\xc4\x8c' or '\xc5\xbe' to unaccented chars so that 'Terme
> \xc4\x8cate\xc5\xbe' become 'Terme Catez'. Is there any way convert from
> whole HTML.

First convert to unicode with 

txt = txt.decode("utf-8") and then follow

http://effbot.org/zone/unicode-convert.htm


Peter

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] accented characters to unaccented

Reply via email to