On Jan 23, 3:54 am, "M.-A. Lemburg" <[EMAIL PROTECTED]> wrote:
> >> I was asking this community if there was a simple way to use only the > >> tools included with Python to parse a bit of html. > > There are lots of ways doing HTML parsing in Python. A common > one is e.g. using mxTidy to convert the HTML into valid XHTML > and then use ElementTree to parse the data. > > http://www.egenix.com/files/python/mxTidy.htmlhttp://docs.python.org/lib/module-xml.etree.ElementTree.html > > For simple tasks you can also use the HTMLParser that's part > of the Python std lib. > > http://docs.python.org/lib/module-HTMLParser.html > > Which tools to use is really dependent on what you are > trying to solve. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jan 23 2008)>>> > Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > ________________________________________________________________________ > > :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 Thanks. So far that makes 3 votes for BeautifulSoup, and one vote each for libxml2dom, pyparsing, and mxTidy. I'm sure those would all be great solutions, if I was looking to solve my coding question with external modules. Several folks have mentioned now that they think that if I have files that are valid XHTML, that I could use htmllib, HTMLParser, or ElementTree (all of which are part of the standard libraries in v 2.5). Skipping past html validation, and html to xhtml 'cleaning', and instead starting with the assumption that I have files that are valid XHTML, can anyone give me a good example of how I would use _ htmllib, HTMLParser, or ElementTree _ to parse out the text of one specific childNode, similar to the examples that I provided above using regex? -- http://mail.python.org/mailman/listinfo/python-list