Hi,

  Thank you for the reply.

-David

On Mon, Dec 24, 2012 at 4:18 PM, Markus Jelsma
<[email protected]>wrote:

> HI
>
> -----Original message-----
> > From:David Philip <[email protected]>
> > Sent: Mon 24-Dec-2012 09:50
> > To: [email protected]
> > Subject: Re: Difference in params - depth and topN
> >
> > Hi Markus,
> >     What is the default value for topN when it is not passed through
> > command? I mean simply passing the depth param and no topN - (bin/nutch
> > crawl urls -dir crawl -depth 3)
>
> There is no default, if not specified the generator will select all URL's
> that are eligible for fetch.
>
> >
> > Also,If the depth is number of crawl cycles, can you please brief me on
> the
> > logic behind it to crawl all the 5 URL when depth param passed is 3
> (-depth
> > 3)?
>
> Can be multiple reasons:
> - not all outlinks are correct
> - limit on number of url's per host or domain
> - transient error
>
> >
> > Thanks
> > David.
> >
> > On Fri, Dec 21, 2012 at 6:25 PM, Markus Jelsma
> > <[email protected]>wrote:
> >
> > > Hi - Depth means how many crawl cycles are executes and topN means how
> > > many URL's per cycle are selected.
> > >
> > > -----Original message-----
> > > > From:David Philip <[email protected]>
> > > > Sent: Fri 21-Dec-2012 13:50
> > > > To: [email protected]
> > > > Subject: Difference in params - depth and topN
> > > >
> > > > Hello All,
> > > >
> > > >    There is a site that has total 5 URLS.
> > > >
> > > >
> > > >    - When this site is crawled with input param for depth 3 , all 5
> sites
> > > >    are crawled in one shot.
> > > >
> > > >    - And when it is crawled with  params : depth 1 topN 5  TWO times,
> > >  for
> > > >    this first time only one URL is crawled and second time rest 4 are
> > > crawled.
> > > >
> > > >    - And when params: depth 1 topN 3  - after 3 times it crawled all
> the
> > > 5
> > > >    sites.
> > > >
> > > > Didn't understand what does these 2 parameters mean. Can anyone
> brief or
> > > > redirect to url that explains this? Below are the list of url and
> readdb
> > > > stats.
> > > >
> > > > *stats:*
> > > > Statistics for CrawlDb: crawl/crawldb
> > > > TOTAL urls: 5
> > > > status 2 (db_fetched): 5
> > > > CrawlDb statistics: done
> > > >
> > > > *URLS : *
> > > > http://liveforyou.blogspot.in/
> > > > http://liveforyou.blogspot.in/2012/12/blogging.html
> > > > http://liveforyou.blogspot.in/2011/09/test.html
> > > > http://liveforyou.blogspot.in/2012_12_01_archive.html
> > > > http://liveforyou.blogspot.in/2011_09_01_archive.html
> > > >
> > >
> >
>

Reply via email to