yes, that's right.

On Tue, Apr 29, 2014 at 10:53 PM, S.L <[email protected]> wrote:

> Thanks,will this skip any URLs at each level/fetch if a particular host has
> more than the value we set it to  ?
>
>
> On Tue, Apr 29, 2014 at 10:48 AM, feng lu <[email protected]> wrote:
>
> > Maybe you can set this property to limit the count of allowed URLs per
> host
> > / domain. default is -1.
> >
> > <property>
> >   <name>generate.max.count</name>
> >   <value>-1</value>
> >   <description>The maximum number of urls in a single
> >   fetchlist.  -1 if unlimited. The urls are counted according
> >   to the value of the parameter generator.count.mode.
> >   </description>
> > </property>
> >
> >
> >
> > On Tue, Apr 29, 2014 at 11:14 AM, S.L <[email protected]> wrote:
> >
> > > Hi All,
> > >
> > > I am crawling multiple big websites for which I have the homepage as
> the
> > > URL in the seed file. The problem I am facing is that one of the
> websites
> > > is getting crawled at a faster pace than the rest of the websites and
> as
> > a
> > > result the indexed data contains a disproportionate number of entries
> for
> > > this one website.
> > >
> > > I suspect that this is happening because this website in question has
> > > homepage with the maximum number of outlinks.
> > >
> > > My questions is how can I control the behaviour of Nutch so as to crawl
> > > every host/domain in a balanced way.
> > >
> > > I am using Nutch 1.7
> > >
> > > Thanks.
> > >
> >
> >
> >
> > --
> > Don't Grow Old, Grow Up... :-)
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to