How to configure SolrDeDup Job to run per batch Id not entire index?

Tony Mullins Thu, 18 Jul 2013 04:31:04 -0700

Hi,

Currently in Nutch2.x SolrDeDup job runs on entire index.
Is it possible to configure it to run against the current batch Id ?


We are trying to maintain historical data in Solr, crawled by nutch on the
bases of date on it was crawled.

So in this scenario when I run the nutch crawl script it removes all
duplicate docs against all dates (in entire index) and If I remove the
SolrDeDup command from crawl script and run it with numberOfRounds >= 2
then I get duplicate docs against each ( generate ->fetch -> parse->
dbupdate-> solrindex)  cycle.

Thanks,
Tony.

How to configure SolrDeDup Job to run per batch Id not entire index?

Reply via email to