Re: Html character entity conversion

[EMAIL PROTECTED] Sun, 30 Jul 2006 11:55:55 -0700

danielx wrote:
> [EMAIL PROTECTED] wrote:
> > Here is my script:
> >
> > from mechanize import *
> > from BeautifulSoup import *
> > import StringIO
> > b = Browser()
> > f = b.open("http://www.translate.ru/text.asp?lang=ru";)
> > b.select_form(nr=0)
> > b["source"] = "hello python"
> > html = b.submit().get_data()
> > soup = BeautifulSoup(html)
> > print  soup.find("span", id = "r_text").string
> >
> > OUTPUT:
> > &#1087;&#1088;&#1080;&#1074;&#1077;&#1090;
> > &#1087;&#1080;&#1090;&#1086;&#1085;
> > ----------
> > In russian it looks like:
> > "привет питон"
> >
> > How can I translate this using standard Python libraries??
> >
> > --
> > Pak Andrei, http://paxoblog.blogspot.com, icq://97449800
>
> I'm having trouble understanding how your script works (what would a
> "BeautifulSoup" function do?), but assuming your intent is to find
> character reference objects in an html document, you might try using
> the HTMLParser class in the HTMLParser module. This class delegates
> several methods. One of them is handle_charref. It will be called with
> one argument, the name of the reference, which includes only the number
> part. HTMLParser is alot more powerful than that though. There may be
> something more light-weight out there that will accomplish what you
> want. Then again, you might be able to find a use for all that power :P.


Thank you for response.
It doesn't matter what is 'BeautifulSoup'...
General question is:

How can I convert encoded string

sEncodedHtmlText = '&#1087;&#1088;&#1080;&#1074;&#1077;&#1090;
&#1087;&#1080;&#1090;&#1086;&#1085;'

into human readable:

sDecodedHtmlText  == 'привет питон'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Html character entity conversion

Reply via email to