Thanks for the top on constructing the query, it's a good starting point.. Yes I'm very aggressive about deduping at several levels. Duplicate pages don't seem to be that much of a problem at the moment. This is mostly for domains that have excessively used keywords to get rankings.. Deduping near duplicates and spammy pages is another topic..
When the query is 'mazda' it return many different pages from mazda-parts.tld before returning pages from other domains. This seems to be because they all score higher in solr than the next domain.. collapsing would help as then there would only be 2 links for the domain's hosts, www and tld with the most relevant link being displayed.. I'll have to work on it a bit.. :) Markus Jelsma-2 wrote > Hello Alexis, see inline. > > Regards, > Markus > > fq={!collapse field=host} ----- Bee Keeper at IZaBEE.com -- Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html