HI
 
-----Original message-----
> From:David Philip <[email protected]>
> Sent: Mon 24-Dec-2012 09:50
> To: [email protected]
> Subject: Re: Difference in params - depth and topN
> 
> Hi Markus,
>     What is the default value for topN when it is not passed through
> command? I mean simply passing the depth param and no topN - (bin/nutch
> crawl urls -dir crawl -depth 3)

There is no default, if not specified the generator will select all URL's that 
are eligible for fetch.

> 
> Also,If the depth is number of crawl cycles, can you please brief me on the
> logic behind it to crawl all the 5 URL when depth param passed is 3 (-depth
> 3)?

Can be multiple reasons:
- not all outlinks are correct
- limit on number of url's per host or domain
- transient error

> 
> Thanks
> David.
> 
> On Fri, Dec 21, 2012 at 6:25 PM, Markus Jelsma
> <[email protected]>wrote:
> 
> > Hi - Depth means how many crawl cycles are executes and topN means how
> > many URL's per cycle are selected.
> >
> > -----Original message-----
> > > From:David Philip <[email protected]>
> > > Sent: Fri 21-Dec-2012 13:50
> > > To: [email protected]
> > > Subject: Difference in params - depth and topN
> > >
> > > Hello All,
> > >
> > >    There is a site that has total 5 URLS.
> > >
> > >
> > >    - When this site is crawled with input param for depth 3 , all 5 sites
> > >    are crawled in one shot.
> > >
> > >    - And when it is crawled with  params : depth 1 topN 5  TWO times,
> >  for
> > >    this first time only one URL is crawled and second time rest 4 are
> > crawled.
> > >
> > >    - And when params: depth 1 topN 3  - after 3 times it crawled all the
> > 5
> > >    sites.
> > >
> > > Didn't understand what does these 2 parameters mean. Can anyone brief or
> > > redirect to url that explains this? Below are the list of url and readdb
> > > stats.
> > >
> > > *stats:*
> > > Statistics for CrawlDb: crawl/crawldb
> > > TOTAL urls: 5
> > > status 2 (db_fetched): 5
> > > CrawlDb statistics: done
> > >
> > > *URLS : *
> > > http://liveforyou.blogspot.in/
> > > http://liveforyou.blogspot.in/2012/12/blogging.html
> > > http://liveforyou.blogspot.in/2011/09/test.html
> > > http://liveforyou.blogspot.in/2012_12_01_archive.html
> > > http://liveforyou.blogspot.in/2011_09_01_archive.html
> > >
> >
> 

Reply via email to