Hi Vangelis, > Cons: Scoring is not used for selection Domains (hosts) at the start of a > region > (mapper input) have the highest chance to get selected. > > I guess that the first line is wrong and should be updated.
Afaics, that belongs to section "Things for future development", resp. "Suggestions". If I didn't miss something that not relevant for the current 2.x. Sebastian On 05/23/2014 09:46 AM, Vangelis karv wrote: > Thanks Sebastian for your trouble! > In http://wiki.apache.org/nutch/Nutch2Crawling , just before the Fetch > procedure, it says: > > Cons: Scoring is not used for selection Domains (hosts) at the start of a > region (mapper input) have the highest chance to get selected. > > I guess that the first line is wrong and should be updated. > > > >> Date: Thu, 22 May 2014 21:28:10 +0200 >> From: [email protected] >> To: [email protected] >> Subject: Re: Importance of Score >> >> Hi Vangelis, >> >>> Does it choose Urls with the highest score >> Yes, it does. Have a look at generatorSortValue(...) in one the scoring >> filter plugins. >> In case of scoring-opic (activated per default), URLs/docs are simply ranked >> by score >> taken from CrawlDb. But other scoring filters may use different strategies >> to rank >> and select URLs for fetching. And of course, you are able to adapt it to >> your own needs >> by writing a new scoring filter. Finally, scoring filters can be combined by >> chaining: >> the initSort parameter is the value returned by the preceding scoring filter. >> >> Sebastian >> >> On 05/22/2014 05:59 PM, Vangelis karv wrote: >>> (Apache Nutch 2.2.1) >>> >>> Hi again! >>> GeneratorJob marks the best topN sites for fetching. Does it choose Urls >>> with the highest score or random Urls? If it chooses randomly, then whats >>> the point of the score field?? >>> Thank you! >>> >>> >>> >> > >

