Hi all,
We've published multiple crawls using the distributed search method outlined in the Nutch Hadoop Tutorial on the wiki. These crawls each used different subsets of urls - and it's beneficial to us to keep them separately. We'd like to preferentially boost some crawls over others, which I believed could be accomplished with query.xxx.boost settings in nutch-site.xml. This is, apparently, not the case. When publishing crawls using different query boosts, there's a queryNorm that seems to normalize the score relative to the input weights. Can I stop queryNorm? What would be the adverse effects? Where do I look to stop it? Is there a better way to boost one crawl relative to another? Many thanks, Rob

