> I think that should be topN in generate.
No, as Markus said: generate.max.count does the job
in combination with generate.count.mode. In combination
with -depth it's possible to get a limited and almost
evenly distributed number of pages per host/domain.

<property>
  <name>generate.max.count</name>
  <value>-1</value>
  <description>The maximum number of urls in a single
  fetchlist.  -1 if unlimited. The urls are counted according
  to the value of the parameter generator.count.mode.
  </description>
</property>

<property>
  <name>generate.count.mode</name>
  <value>host</value>
  <description>Determines how the URLs are counted for generator.max.count.
  Default value is 'host' but can be 'domain'. Note that we do not count
  per IP in the new version of the Generator.
  </description>
</property>


On 07/05/2013 08:58 PM, h b wrote:
> I think that should be topN in generate.
> 
> the generate.max.count will fetch those many pages for each fetch, not
> necessarily for each domain.
> The topN will only pull topN urls to be fetched for the fetch pass.
> 
> 
> 
> On Fri, Jul 5, 2013 at 1:13 AM, Markus Jelsma 
> <markus.jel...@openindex.io>wrote:
> 
>> generate.max.count?
>>
>>
>> -----Original message-----
>>> From:Dennis Yurichev <dennis_mailing_li...@conus.info>
>>> Sent: Friday 5th July 2013 5:25
>>> To: user@nutch.apache.org
>>> Subject: limit to fetch only N pages from each host?
>>>
>>> Hi.
>>>
>>> How to limit nutch 2.x to fetch only N (5-10) pages from each host or
>>> domain?
>>> I fail to figure it out from config files.
>>>
>>> TIA!
>>>
>>> --
>>> -- http://www.yurichev.com
>>>
>>
> 

Reply via email to