Thanks Sebastian for your trouble! In http://wiki.apache.org/nutch/Nutch2Crawling , just before the Fetch procedure, it says:
Cons: Scoring is not used for selection Domains (hosts) at the start of a region (mapper input) have the highest chance to get selected. I guess that the first line is wrong and should be updated. > Date: Thu, 22 May 2014 21:28:10 +0200 > From: [email protected] > To: [email protected] > Subject: Re: Importance of Score > > Hi Vangelis, > > > Does it choose Urls with the highest score > Yes, it does. Have a look at generatorSortValue(...) in one the scoring > filter plugins. > In case of scoring-opic (activated per default), URLs/docs are simply ranked > by score > taken from CrawlDb. But other scoring filters may use different strategies to > rank > and select URLs for fetching. And of course, you are able to adapt it to your > own needs > by writing a new scoring filter. Finally, scoring filters can be combined by > chaining: > the initSort parameter is the value returned by the preceding scoring filter. > > Sebastian > > On 05/22/2014 05:59 PM, Vangelis karv wrote: > > (Apache Nutch 2.2.1) > > > > Hi again! > > GeneratorJob marks the best topN sites for fetching. Does it choose Urls > > with the highest score or random Urls? If it chooses randomly, then whats > > the point of the score field?? > > Thank you! > > > > > > >

