You will probably need to customize the parse-html plugin for your purpose
On Mar 26, 2015 4:20 PM, "Richardson, Jacquelyn F." <[email protected]>
wrote:

> Hi,
>
> Is there a way to tell nutch to ignore the navigation or footer parts of
> an html page during the crawl process?  Specifically I do not want the
> information in the navigation or footer to be indexed.  My environment is
> Windows 7 with Cygwin, Java 1.7, nutch 1.9 (binary not source) and solr 4.7.
>
> Any assistance will be greatly appreciated.
>
> Thanks,
> Jackie
>
>

Reply via email to