RE: Importance of Score

Vangelis karv Fri, 23 May 2014 00:47:30 -0700

Thanks Sebastian for your trouble! 
In http://wiki.apache.org/nutch/Nutch2Crawling , just before the Fetch 
procedure, it says:


Cons: Scoring is not used for selection Domains (hosts) at the start of a 
region (mapper input) have the highest chance to get selected. 

I guess that the first line is wrong and should be updated.



> Date: Thu, 22 May 2014 21:28:10 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: Importance of Score
> 
> Hi Vangelis,
> 
> > Does it choose Urls with the highest score
> Yes, it does. Have a look at generatorSortValue(...) in one the scoring 
> filter plugins.
> In case of scoring-opic (activated per default), URLs/docs are simply ranked 
> by score
> taken from CrawlDb. But other scoring filters may use different strategies to 
> rank
> and select URLs for fetching. And of course, you are able to adapt it to your 
> own needs
> by writing a new scoring filter. Finally, scoring filters can be combined by 
> chaining:
> the initSort parameter is the value returned by the preceding scoring filter.
> 
> Sebastian
> 
> On 05/22/2014 05:59 PM, Vangelis karv wrote:
> > (Apache Nutch 2.2.1)
> > 
> > Hi again!
> > GeneratorJob marks the best topN sites for fetching. Does it choose Urls 
> > with the highest score or random Urls? If it chooses randomly, then whats 
> > the point of the score field?? 
> > Thank you!
> > 
> >                                       
> > 
>

RE: Importance of Score

Reply via email to