Thanks , Feng , but that not what we want though, you mean there is no
mechanism by which we can set a limit for  a host to fetch at each level
and put the rest in the queue so that we have a equal representation from
all hosts while the index is being built up ?


On Wed, Apr 30, 2014 at 1:26 AM, feng lu <[email protected]> wrote:

> yes, that's right.
>
>
> On Tue, Apr 29, 2014 at 10:53 PM, S.L <[email protected]> wrote:
>
> > Thanks,will this skip any URLs at each level/fetch if a particular host
> has
> > more than the value we set it to  ?
> >
> >
> > On Tue, Apr 29, 2014 at 10:48 AM, feng lu <[email protected]> wrote:
> >
> > > Maybe you can set this property to limit the count of allowed URLs per
> > host
> > > / domain. default is -1.
> > >
> > > <property>
> > >   <name>generate.max.count</name>
> > >   <value>-1</value>
> > >   <description>The maximum number of urls in a single
> > >   fetchlist.  -1 if unlimited. The urls are counted according
> > >   to the value of the parameter generator.count.mode.
> > >   </description>
> > > </property>
> > >
> > >
> > >
> > > On Tue, Apr 29, 2014 at 11:14 AM, S.L <[email protected]>
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am crawling multiple big websites for which I have the homepage as
> > the
> > > > URL in the seed file. The problem I am facing is that one of the
> > websites
> > > > is getting crawled at a faster pace than the rest of the websites and
> > as
> > > a
> > > > result the indexed data contains a disproportionate number of entries
> > for
> > > > this one website.
> > > >
> > > > I suspect that this is happening because this website in question has
> > > > homepage with the maximum number of outlinks.
> > > >
> > > > My questions is how can I control the behaviour of Nutch so as to
> crawl
> > > > every host/domain in a balanced way.
> > > >
> > > > I am using Nutch 1.7
> > > >
> > > > Thanks.
> > > >
> > >
> > >
> > >
> > > --
> > > Don't Grow Old, Grow Up... :-)
> > >
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>

Reply via email to