Hi Markus,

Thanks for the reply.  While waiting I found this:
https://issues.apache.org/jira/browse/NUTCH-585

Are you familiar with this patch?  How does this compare with your suggestion?

There are three attachments on the page.  Which is the correct patch?

I have never applied a patch to nutch before.  Could you point me in the right 
direction as far as instructions for applying a patch to my environment?

Jackie

-----Original Message-----
From: Markus Jelsma [mailto:[email protected]] 
Sent: Thursday, March 26, 2015 11:33 AM
To: [email protected]
Subject: RE: Ignore navigation during index

Hello - check out NUTCH-961. It adds support for Boilerpipe to Nutch' Tika 
parser. It's crude but works reasonably.
https://issues.apache.org/jira/browse/NUTCH-961

Markus
 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <[email protected]>
> Sent: Thursday 26th March 2015 16:20
> To: [email protected]
> Subject: Ignore navigation during index
> 
> Hi,
> 
> Is there a way to tell nutch to ignore the navigation or footer parts of an 
> html page during the crawl process?  Specifically I do not want the 
> information in the navigation or footer to be indexed.  My environment is 
> Windows 7 with Cygwin, Java 1.7, nutch 1.9 (binary not source) and solr 4.7.
> 
> Any assistance will be greatly appreciated.
> 
> Thanks,
> Jackie
> 
> 

Reply via email to