Boosting one crawl over another

Rob Hunter Thu, 04 Nov 2010 12:02:40 -0700

Hi all,


   We've published multiple crawls using the distributed search method
outlined in the Nutch Hadoop Tutorial on the wiki.  These crawls each
used different subsets of urls - and it's beneficial to us to keep them
separately.  We'd like to preferentially boost some crawls over others,
which I believed could be accomplished with query.xxx.boost settings in
nutch-site.xml.  This is, apparently, not the case.

 

   When publishing crawls using different query boosts, there's a
queryNorm that seems to normalize the score relative to the input
weights.  Can I stop queryNorm?  What would be the adverse effects?
Where do I look to stop it?  Is there a better way to boost one crawl
relative to another?

 

Many thanks,

Rob

Boosting one crawl over another

Reply via email to