Dynamic content handler

Andrea Asta Tue, 19 May 2015 01:59:31 -0700

Hello,
I would implement the following scenario:

- For HTML pages with a given URL Pattern, extract a part of the page
starting from an XPath
- For other generic HTML pages I would use Boilerpipe
- For different file formats, a simple BodyContentHandler is ok


What's the best way to do this in Tika?

Thanks
Andrea

Dynamic content handler

Reply via email to