Hi, I have open url and read like following:
$import urllib $txt = urllib.urlopen("http://www.terme-catez.si").read() $txt Gives output like below: ----other parts are skipped --- r\n 2010\r\n <a href="http://www.terme-catez.si" target="_blank">Terme \xc4\x8cate\xc5\xbe</a>\r\n Slovenija\r\n <br />\r\n Spletne re\ xc5\xa1itve\r\n © 1996-\r\n 2010\r\n <a href=" http://www.tme dia.biz" target="_blank">(T)media</a></p>\r\n </div>\r\n</div>\r\n<div class="o zadje_catez"></div>\r\n<div class="jsPopupDivFader" id="fader" onClick="javascri pt:showHide(itemShown);">\r\n <table width="100%" height="100%">\r\n <tr val ign="middle">\r\n <td align="center"></td>\r\n </tr>\r\n </table>\r\n</ div>\r\n\r\n<script src="http://www.google-analytics.com/urchin.js" type="text/j avascript"></script>\r\n<script type="text/javascript">\r\n_uacct = "UA-1815955- 1";\r\nurchinTracker();\r\n</script>\r\n\r\n</body>\r\n</html>\r\n' If you see above, in junk of HTLM, there is text like 'Terme \xc4\x8cate\xc5\xbe' (original is 'Terme Čatež'). Now, I want to convert code like '\xc4\x8c' or '\xc5\xbe' to unaccented chars so that 'Terme \xc4\x8cate\xc5\xbe' become 'Terme Catez'. Is there any way convert from whole HTML. Thanks in advance.
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor