I need to parse real world HTML/XML documents and I found two nice python solution: BeautifulSoup and Tidy.
However I found pyXPCOM that is a wrapper for Gecko. So I was thinking Gecko surely handles bad html in a more consistent and error-proof way than BS and Tidy. I'm interested in using Mozilla DOM from inside a Python script, however I'm a bit confused about how can I use pyXPCOM to accomplish this job. Any suggestions? -- http://mail.python.org/mailman/listinfo/python-list