HtmlParseFilter or IndexingFilter. If you do want to parse and extract outlinks use an indexing filter to deny pages from being indexed. If you just want to throw away the whole page and it's outlinks if it does not contain your terms then implement HtmlParseFilter. See plugins for examples.
-----Original message----- > From:mausmust <[email protected]> > Sent: Tue 17-Jul-2012 09:53 > To: [email protected] > Subject: Re: Nutch Content Filtering > > Which interface i should use for implementing? > > > On 07/17/2012 10:45 AM, Markus Jelsma wrote: > > Hi, > > > > You can create a simple parse or index filter implementation, check for > > words in the content and act appropriately. > > > > Cheers > > > > >

