jasiu85 wrote:
> I have a problem with character encoding in LXML. Here's how it goes:
>
> I read an HTML document from a third-party site. It is supposed to be
> in UTF-8, but unfortunately from time to time it's not.
You can instantiate your own HTML parser and pass enco
Hi Mike,
> I read an HTML document from a third-party site. It is supposed to be
> in UTF-8, but unfortunately from time to time it's not.
There will be host of more lightweight solutions, but you can opt
to sanizite incominhg HTML with HTML Tidy (python binding available).
It will replace inval
Hey,
I have a problem with character encoding in LXML. Here's how it goes:
I read an HTML document from a third-party site. It is supposed to be
in UTF-8, but unfortunately from time to time it's not. I parse the
document like this:
html_doc = HTML(string_with_document)
Then I ret