Hi, Thank you for the reply.
-David On Mon, Dec 24, 2012 at 4:18 PM, Markus Jelsma <[email protected]>wrote: > HI > > -----Original message----- > > From:David Philip <[email protected]> > > Sent: Mon 24-Dec-2012 09:50 > > To: [email protected] > > Subject: Re: Difference in params - depth and topN > > > > Hi Markus, > > What is the default value for topN when it is not passed through > > command? I mean simply passing the depth param and no topN - (bin/nutch > > crawl urls -dir crawl -depth 3) > > There is no default, if not specified the generator will select all URL's > that are eligible for fetch. > > > > > Also,If the depth is number of crawl cycles, can you please brief me on > the > > logic behind it to crawl all the 5 URL when depth param passed is 3 > (-depth > > 3)? > > Can be multiple reasons: > - not all outlinks are correct > - limit on number of url's per host or domain > - transient error > > > > > Thanks > > David. > > > > On Fri, Dec 21, 2012 at 6:25 PM, Markus Jelsma > > <[email protected]>wrote: > > > > > Hi - Depth means how many crawl cycles are executes and topN means how > > > many URL's per cycle are selected. > > > > > > -----Original message----- > > > > From:David Philip <[email protected]> > > > > Sent: Fri 21-Dec-2012 13:50 > > > > To: [email protected] > > > > Subject: Difference in params - depth and topN > > > > > > > > Hello All, > > > > > > > > There is a site that has total 5 URLS. > > > > > > > > > > > > - When this site is crawled with input param for depth 3 , all 5 > sites > > > > are crawled in one shot. > > > > > > > > - And when it is crawled with params : depth 1 topN 5 TWO times, > > > for > > > > this first time only one URL is crawled and second time rest 4 are > > > crawled. > > > > > > > > - And when params: depth 1 topN 3 - after 3 times it crawled all > the > > > 5 > > > > sites. > > > > > > > > Didn't understand what does these 2 parameters mean. Can anyone > brief or > > > > redirect to url that explains this? Below are the list of url and > readdb > > > > stats. > > > > > > > > *stats:* > > > > Statistics for CrawlDb: crawl/crawldb > > > > TOTAL urls: 5 > > > > status 2 (db_fetched): 5 > > > > CrawlDb statistics: done > > > > > > > > *URLS : * > > > > http://liveforyou.blogspot.in/ > > > > http://liveforyou.blogspot.in/2012/12/blogging.html > > > > http://liveforyou.blogspot.in/2011/09/test.html > > > > http://liveforyou.blogspot.in/2012_12_01_archive.html > > > > http://liveforyou.blogspot.in/2011_09_01_archive.html > > > > > > > > > >

