Hi everyone. I wonder if anyone can help me.
I am crawling our site with nutch 1.9, and would like to be able to parse the pages but not the headers, navbar and footer. The reason for this is because when you post it to Solr, the content field starts with the same text for all pages, and if you query for text that is in the navbar for instance, it includes all your pages. It there any way of configuring Nutch to do this? Kind Regards Mark
signature.asc
Description: Message signed with OpenPGP using GPGMail

