Re: Html character entity conversion

danielx Sun, 30 Jul 2006 08:45:53 -0700

[EMAIL PROTECTED] wrote:
> Here is my script:
>
> from mechanize import *
> from BeautifulSoup import *
> import StringIO
> b = Browser()
> f = b.open("http://www.translate.ru/text.asp?lang=ru";)
> b.select_form(nr=0)
> b["source"] = "hello python"
> html = b.submit().get_data()
> soup = BeautifulSoup(html)
> print  soup.find("span", id = "r_text").string
>
> OUTPUT:
> &#1087;&#1088;&#1080;&#1074;&#1077;&#1090;
> &#1087;&#1080;&#1090;&#1086;&#1085;
> ----------
> In russian it looks like:
> "привет питон"
>
> How can I translate this using standard Python libraries??
>
> --
> Pak Andrei, http://paxoblog.blogspot.com, icq://97449800


I'm having trouble understanding how your script works (what would a
"BeautifulSoup" function do?), but assuming your intent is to find
character reference objects in an html document, you might try using
the HTMLParser class in the HTMLParser module. This class delegates
several methods. One of them is handle_charref. It will be called with
one argument, the name of the reference, which includes only the number
part. HTMLParser is alot more powerful than that though. There may be
something more light-weight out there that will accomplish what you
want. Then again, you might be able to find a use for all that power :P.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Html character entity conversion

Reply via email to