On Fri, 2018-08-10 at 02:46 +0100, James Read via xml wrote:
> I have a bunch of html files on disk and want to open them and
> extract the contents of the title tag using libxml2. 

By this do you mean the title element in the head?

You can use XPath on an XML document to extract /html/head/title but
you may need to use the HTML reader, as most HTML files are not well-
formed XML syntactically. You can experiment first with xmllint --xpath 
/html/head/title foo.xml and see what happens.

If "a bunch" means tens of thousands of HTML files and you do this
often, consider a tree store such as dbxml or (much easier to get
started with i think) BaseX, so that there's an element index (or
btree) and retrieval might be orders of magnitude faster.


