Hi All,

We need to remove header, footer and menu from the crawled content before
we index content into SOLR. I researched online and found references to
removal via Tika's boilerpipe support - Nutch-961

We are currently using Nutch 1.7 but I am looking into updating to Nutch
1.10. I am hoping that the newer version of Tika in Nutch 1.10 will do a
better job in removing extra content.

I will be very thankful if you can let me know the best method and steps
to achieve this goal and how effective this is in removal.

Thanks so much,
Madhvi

Reply via email to