Although there is no real notion of depth, as you already figured out, you can keep track of it via a scoring filter. http://grokbase.com/t/nutch/user/1092p10q5g/depth-information-not-being-available-in-crawl-datum
-----Original message----- > From:Nguyen Manh Tien <[email protected]> > Sent: Friday 13th December 2013 5:30 > To: [email protected] > Subject: Effective way to crawling seed and discover new urls. > > Hi, > > I am crawling a list of home pages to discover new articles, crawler will > stop at depth 1.But at depth 1, crawler still add many new urls with depth > 2, so event i only crawl up to depth 1 but crawldb still have many, many > urls at depth 2. Is there any way to prevent that or we need to implement a > custom plugin? > > And i only want to index discovered article at depth 1, not seed. do we > have a feature to do that? > > Thanks. > Tien >

