> I think that should be topN in generate.
No, as Markus said: generate.max.count does the job
in combination with generate.count.mode. In combination
with -depth it's possible to get a limited and almost
evenly distributed number of pages per host/domain.

<property>
  <name>generate.max.count</name>
  <value>-1</value>
  <description>The maximum number of urls in a single
  fetchlist.  -1 if unlimited. The urls are counted according
  to the value of the parameter generator.count.mode.
  </description>
</property>

<property>
  <name>generate.count.mode</name>
  <value>host</value>
  <description>Determines how the URLs are counted for generator.max.count.
  Default value is 'host' but can be 'domain'. Note that we do not count
  per IP in the new version of the Generator.
  </description>
</property>


On 07/05/2013 08:58 PM, h b wrote:
> I think that should be topN in generate.
> 
> the generate.max.count will fetch those many pages for each fetch, not
> necessarily for each domain.
> The topN will only pull topN urls to be fetched for the fetch pass.
> 
> 
> 
> On Fri, Jul 5, 2013 at 1:13 AM, Markus Jelsma 
> <[email protected]>wrote:
> 
>> generate.max.count?
>>
>>
>> -----Original message-----
>>> From:Dennis Yurichev <[email protected]>
>>> Sent: Friday 5th July 2013 5:25
>>> To: [email protected]
>>> Subject: limit to fetch only N pages from each host?
>>>
>>> Hi.
>>>
>>> How to limit nutch 2.x to fetch only N (5-10) pages from each host or
>>> domain?
>>> I fail to figure it out from config files.
>>>
>>> TIA!
>>>
>>> --
>>> -- http://www.yurichev.com
>>>
>>
> 

Reply via email to