You will probably need to customize the parse-html plugin for your purpose On Mar 26, 2015 4:20 PM, "Richardson, Jacquelyn F." <[email protected]> wrote:
> Hi, > > Is there a way to tell nutch to ignore the navigation or footer parts of > an html page during the crawl process? Specifically I do not want the > information in the navigation or footer to be indexed. My environment is > Windows 7 with Cygwin, Java 1.7, nutch 1.9 (binary not source) and solr 4.7. > > Any assistance will be greatly appreciated. > > Thanks, > Jackie > >

