Hi Markus, Thanks for the reply. While waiting I found this: https://issues.apache.org/jira/browse/NUTCH-585
Are you familiar with this patch? How does this compare with your suggestion? There are three attachments on the page. Which is the correct patch? I have never applied a patch to nutch before. Could you point me in the right direction as far as instructions for applying a patch to my environment? Jackie -----Original Message----- From: Markus Jelsma [mailto:[email protected]] Sent: Thursday, March 26, 2015 11:33 AM To: [email protected] Subject: RE: Ignore navigation during index Hello - check out NUTCH-961. It adds support for Boilerpipe to Nutch' Tika parser. It's crude but works reasonably. https://issues.apache.org/jira/browse/NUTCH-961 Markus -----Original message----- > From:Richardson, Jacquelyn F. <[email protected]> > Sent: Thursday 26th March 2015 16:20 > To: [email protected] > Subject: Ignore navigation during index > > Hi, > > Is there a way to tell nutch to ignore the navigation or footer parts of an > html page during the crawl process? Specifically I do not want the > information in the navigation or footer to be indexed. My environment is > Windows 7 with Cygwin, Java 1.7, nutch 1.9 (binary not source) and solr 4.7. > > Any assistance will be greatly appreciated. > > Thanks, > Jackie > >

