Hi Sebastian,

Of course, I had just copied the property and pasted. My bad.
Thanks for confirming.

Suraj Singh 

-----Original Message-----
From: Sebastian Nagel <wastl.na...@googlemail.com.INVALID> 
Sent: Wednesday, 20 February 2019 13:26
To: user@nutch.apache.org
Subject: Re: Increasing the number of reducer in Deduplication

Hi Suraj,

the correct syntax would be:

  __bin_nutch dedup -Dmapreduce.job.reduces=32 "$CRAWL_PATH"/crawldb

Hadoop configuration properties must be passed before remaining arguments and 
you need to pass them as -Dname=value

To confirm: I use to run the dedup job with 1200 reducers on a CrawlDb with 
more than 10 billion URLs.  Works seamlessly.


On 2/20/19 12:55 PM, Suraj Singh wrote:
> Hi All,
> Can I increase the number of reducer in Deduplication on crawldb? Currently 
> it is running with 1 reducer.
> Will it impact the crawling in any way?
> Current command in crawl script:
> __bin_nutch dedup "$CRAWL_PATH"/crawldb
> Can I update it to:
> __bin_nutch dedup "$CRAWL_PATH"/crawldb mapreduce.job.reduces=32
> Thanks it advance.
> Regards,
> Suraj Singh

Reply via email to