Hello, I would implement the following scenario: - For HTML pages with a given URL Pattern, extract a part of the page starting from an XPath - For other generic HTML pages I would use Boilerpipe - For different file formats, a simple BodyContentHandler is ok
What's the best way to do this in Tika? Thanks Andrea
