Hi Sebastian, Of course, I had just copied the property and pasted. My bad. Thanks for confirming.
Regards, Suraj Singh -----Original Message----- From: Sebastian Nagel <wastl.na...@googlemail.com.INVALID> Sent: Wednesday, 20 February 2019 13:26 To: firstname.lastname@example.org Subject: Re: Increasing the number of reducer in Deduplication Hi Suraj, the correct syntax would be: __bin_nutch dedup -Dmapreduce.job.reduces=32 "$CRAWL_PATH"/crawldb Hadoop configuration properties must be passed before remaining arguments and you need to pass them as -Dname=value To confirm: I use to run the dedup job with 1200 reducers on a CrawlDb with more than 10 billion URLs. Works seamlessly. Best, Sebastian On 2/20/19 12:55 PM, Suraj Singh wrote: > Hi All, > > Can I increase the number of reducer in Deduplication on crawldb? Currently > it is running with 1 reducer. > Will it impact the crawling in any way? > > Current command in crawl script: > __bin_nutch dedup "$CRAWL_PATH"/crawldb > > Can I update it to: > __bin_nutch dedup "$CRAWL_PATH"/crawldb mapreduce.job.reduces=32 > > Thanks it advance. > > Regards, > Suraj Singh >