Hi,

Currently in Nutch2.x SolrDeDup job runs on entire index.
Is it possible to configure it to run against the current batch Id ?

We are trying to maintain historical data in Solr, crawled by nutch on the
bases of date on it was crawled.

So in this scenario when I run the nutch crawl script it removes all
duplicate docs against all dates (in entire index) and If I remove the
SolrDeDup command from crawl script and run it with numberOfRounds >= 2
then I get duplicate docs against each ( generate ->fetch -> parse->
dbupdate-> solrindex)  cycle.

Thanks,
Tony.

Reply via email to