> I think that should be topN in generate. No, as Markus said: generate.max.count does the job in combination with generate.count.mode. In combination with -depth it's possible to get a limited and almost evenly distributed number of pages per host/domain.
<property> <name>generate.max.count</name> <value>-1</value> <description>The maximum number of urls in a single fetchlist. -1 if unlimited. The urls are counted according to the value of the parameter generator.count.mode. </description> </property> <property> <name>generate.count.mode</name> <value>host</value> <description>Determines how the URLs are counted for generator.max.count. Default value is 'host' but can be 'domain'. Note that we do not count per IP in the new version of the Generator. </description> </property> On 07/05/2013 08:58 PM, h b wrote: > I think that should be topN in generate. > > the generate.max.count will fetch those many pages for each fetch, not > necessarily for each domain. > The topN will only pull topN urls to be fetched for the fetch pass. > > > > On Fri, Jul 5, 2013 at 1:13 AM, Markus Jelsma > <[email protected]>wrote: > >> generate.max.count? >> >> >> -----Original message----- >>> From:Dennis Yurichev <[email protected]> >>> Sent: Friday 5th July 2013 5:25 >>> To: [email protected] >>> Subject: limit to fetch only N pages from each host? >>> >>> Hi. >>> >>> How to limit nutch 2.x to fetch only N (5-10) pages from each host or >>> domain? >>> I fail to figure it out from config files. >>> >>> TIA! >>> >>> -- >>> -- http://www.yurichev.com >>> >> >

