What's wrong with using the scoring-depth plugin?
On 13 December 2013 09:33, Markus Jelsma <[email protected]> wrote: > Although there is no real notion of depth, as you already figured out, you > can keep track of it via a scoring filter. > > http://grokbase.com/t/nutch/user/1092p10q5g/depth-information-not-being-available-in-crawl-datum > > > > -----Original message----- > > From:Nguyen Manh Tien <[email protected]> > > Sent: Friday 13th December 2013 5:30 > > To: [email protected] > > Subject: Effective way to crawling seed and discover new urls. > > > > Hi, > > > > I am crawling a list of home pages to discover new articles, crawler will > > stop at depth 1.But at depth 1, crawler still add many new urls with > depth > > 2, so event i only crawl up to depth 1 but crawldb still have many, many > > urls at depth 2. Is there any way to prevent that or we need to > implement a > > custom plugin? > > > > And i only want to index discovered article at depth 1, not seed. do we > > have a feature to do that? > > > > Thanks. > > Tien > > > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

