Parsing HTML/XML documents

[EMAIL PROTECTED] Thu, 26 Apr 2007 03:46:13 -0700

I need to parse real world HTML/XML documents and I found two nice python
solution: BeautifulSoup and Tidy.


However I found pyXPCOM that is a wrapper for Gecko. So I was thinking
Gecko surely handles bad html in a more consistent and error-proof way
than BS and Tidy.

I'm interested in using Mozilla DOM from inside a Python script, however
I'm a bit confused about how can I use pyXPCOM to accomplish this job.

Any suggestions?
-- 
http://mail.python.org/mailman/listinfo/python-list

Parsing HTML/XML documents

Reply via email to