RE: Increasing the number of reducer in Deduplication

2019-02-20 Thread Suraj Singh
Hi Sebastian, Of course, I had just copied the property and pasted. My bad. Thanks for confirming. Regards, Suraj Singh -Original Message- From: Sebastian Nagel Sent: Wednesday, 20 February 2019 13:26 To: user@nutch.apache.org Subject: Re: Increasing the number of reducer in

Re: Increasing the number of reducer in Deduplication

2019-02-20 Thread Sebastian Nagel
Hi Suraj, the correct syntax would be: __bin_nutch dedup -Dmapreduce.job.reduces=32 "$CRAWL_PATH"/crawldb Hadoop configuration properties must be passed before remaining arguments and you need to pass them as -Dname=value To confirm: I use to run the dedup job with 1200 reducers on a CrawlDb

RE: Increasing the number of reducer in Deduplication

2019-02-20 Thread Suraj Singh
Thanks Markus. Regards, Suraj Singh -Original Message- From: Markus Jelsma Sent: Wednesday, 20 February 2019 13:04 To: user@nutch.apache.org Subject: RE: Increasing the number of reducer in Deduplication Hello Suraj, That should be no problem. Duplicates are grouped by their

RE: Increasing the number of reducer in Deduplication

2019-02-20 Thread Markus Jelsma
Hello Suraj, That should be no problem. Duplicates are grouped by their signature, this means you can have as many reducers as you would like. Regards, Markus -Original message- > From:Suraj Singh > Sent: Wednesday 20th February 2019 12:56 > To: user@nutch.apache.org > Subject: Incr