Hi Vangelis,

> Cons: Scoring is not used for selection Domains (hosts) at the start of a 
> region
> (mapper input) have the highest chance to get selected.
>
> I guess that the first line is wrong and should be updated.

Afaics, that belongs to section "Things for future development", resp. 
"Suggestions".
If I didn't miss something that not relevant for the current 2.x.

Sebastian


On 05/23/2014 09:46 AM, Vangelis karv wrote:
> Thanks Sebastian for your trouble! 
> In http://wiki.apache.org/nutch/Nutch2Crawling , just before the Fetch 
> procedure, it says: 
> 
> Cons: Scoring is not used for selection Domains (hosts) at the start of a 
> region (mapper input) have the highest chance to get selected. 
> 
> I guess that the first line is wrong and should be updated.
> 
> 
> 
>> Date: Thu, 22 May 2014 21:28:10 +0200
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: Importance of Score
>>
>> Hi Vangelis,
>>
>>> Does it choose Urls with the highest score
>> Yes, it does. Have a look at generatorSortValue(...) in one the scoring 
>> filter plugins.
>> In case of scoring-opic (activated per default), URLs/docs are simply ranked 
>> by score
>> taken from CrawlDb. But other scoring filters may use different strategies 
>> to rank
>> and select URLs for fetching. And of course, you are able to adapt it to 
>> your own needs
>> by writing a new scoring filter. Finally, scoring filters can be combined by 
>> chaining:
>> the initSort parameter is the value returned by the preceding scoring filter.
>>
>> Sebastian
>>
>> On 05/22/2014 05:59 PM, Vangelis karv wrote:
>>> (Apache Nutch 2.2.1)
>>>
>>> Hi again!
>>> GeneratorJob marks the best topN sites for fetching. Does it choose Urls 
>>> with the highest score or random Urls? If it chooses randomly, then whats 
>>> the point of the score field?? 
>>> Thank you!
>>>
>>>                                       
>>>
>>
>                                         
> 

Reply via email to