Hi All, We need to remove header, footer and menu from the crawled content before we index content into SOLR. I researched online and found references to removal via Tika's boilerpipe support - Nutch-961
We are currently using Nutch 1.7 but I am looking into updating to Nutch 1.10. I am hoping that the newer version of Tika in Nutch 1.10 will do a better job in removing extra content. I will be very thankful if you can let me know the best method and steps to achieve this goal and how effective this is in removal. Thanks so much, Madhvi

