Probably this is suitable:
<property> <name>generate.max.per.host</name> <value>-1</value> <description>The maximum number of urls per host in a single fetchlist. -1 if unlimited.</description> </property> [-topN N] - Number of top URLs to be selected -----Original Message----- From: MilleBii [mailto:[email protected]] Sent: August-26-09 5:39 AM To: [email protected] Subject: Re: Limiting number of URL from the same site in a fetch cycle db.max.outlinks.per.page will result in missing links ? Don't want that. I just would want to balance them on a next fetch cycle. 2009/8/26 Fuad Efendi <[email protected]> > You can filter some unnecessary "tail" using UrlFilter; for instance, some > sites may have long forums which you don't need, or shopping cart / process > to checkout pages which they forgot to restrict via robots.txt... > > Check regex-urlfilter.txt.template in /conf > > > Another parameter which equalizes 'per-site' URLs is > db.max.outlinks.per.page=100 (some sites may have 10 links per page, others > - 1000...) > > > -Fuad > http://www.linkedin.com/in/liferay > http://www.tokenizer.org > > > > -----Original Message----- > From: MilleBii [mailto:[email protected]] > Sent: August-25-09 5:48 PM > To: [email protected] > Subject: Limiting number of URL from the same site in a fetch cycle > > I'm wondering if there is a setting by which you can limit the number of > urls per site on a fetch list, not a on a total site. > In this way I could avoid long tails in a fetch list all from the same site > so it takes damn long (5s per URL), I'd like to fetch them on the next > cycle. > > -- > -MilleBii- > > > -- -MilleBii-
