Hi Nguyen, On Fri, Dec 13, 2013 at 4:28 AM, <user-digest-h...@nutch.apache.org> wrote:
> > I am crawling a list of home pages to discover new articles, crawler will > stop at depth 1.But at depth 1, crawler still add many new urls with depth > 2, so event i only crawl up to depth 1 but crawldb still have many, many > urls at depth 2. Is there any way to prevent that or we need to implement a > custom plugin? > > And i only want to index discovered article at depth 1, not seed. do we > have a feature to do that? > > Assuming you are still deploying 2.X please see the generate.max.distance property in nutch-site.xml. This will do what you want. It was a very neat patch committed by Ferdy. Some history can be found here https://issues.apache.org/jira/browse/NUTCH-1431 Thanks Lewis