Hello. Given that we now have pipelines in ManifoldCF, How feasible is it to:
- use Tika's BoilerPipe to get cleaner content from web sites? - What about extracting specific HTML tags such as all h1 or h2 and map them to a Solr field? Thank you very much. Arcadius.
