On Fri, 2018-08-10 at 02:46 +0100, James Read via xml wrote: > I have a bunch of html files on disk and want to open them and > extract the contents of the title tag using libxml2.
By this do you mean the title element in the head? You can use XPath on an XML document to extract /html/head/title but you may need to use the HTML reader, as most HTML files are not well- formed XML syntactically. You can experiment first with xmllint --xpath /html/head/title foo.xml and see what happens. If "a bunch" means tens of thousands of HTML files and you do this often, consider a tree store such as dbxml or (much easier to get started with i think) BaseX, so that there's an element index (or btree) and retrieval might be orders of magnitude faster. Liam -- Liam Quin, https://www.holoweb.net/liam/cv/ Web slave for vintage clipart http://www.fromoldbooks.org/ Available for XML/Document/Information Architecture/ XSL/XQuery/Web/Text Processing/A11Y work & consulting. _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml