Hi, Outlinks are added to the ParseData object before being passed to a HTMLParseFilter. In a HTMLParseFilter plugin you can obtain the Outlinks and remove those you don't want.
Outlinks[] outlinks = parseResult.get(content.getUrl()).getData().getOutlinks(); Use the setOutlinks() method to write your processed list to the ParseData. Cheers, -----Original message----- > From:刘?? <[email protected]> > Sent: Fri 03-Aug-2012 15:45 > To: [email protected] > Subject: Can I only add url in a specified div to the fetch list with nutch? > > Such as the title, I want crawl a page with many urls, but only the ones in > a specified div are meaningful to me. So I want to write a plugin to filter > it, but I don't know which extension point should I choose. > > The htmlparser filter can get the html content, but seems like process > after the "add to fetch list" operation. And the urlfilter can control the > fetch list, but I cant get the html content in it. > > Look forward to any helpful replies, thx. >

