Hello Abhi,
Hope below information ll help you.
mapred.reduce.tasks

  *   Default Value: -1
  *   Added In: 0.1
The default number of reduce tasks per job. Typically set to a prime close to 
the number of available hosts. Ignored when mapred.job.tracker is "local". 
Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By 
setting this property to -1, Hive will automatically figure out what should be 
the number of reducers.
hive.exec.reducers.bytes.per.reducer

  *   Default Value: 1000000000
  *   Added In:
Size per reducer. The default is 1G, i.e if the input size is 10G, it will use 
10 reducers.
hive.exec.reducers.max

  *   Default Value: 999
  *   Added In:
Max number of reducers will be used. If the one specified in the configuration 
parameter mapred.reduce.tasks is negative, hive will use this one as the max 
number of reducers when automatically determine number of reducers.

Thanks
Ashok S.

From: Abhishek [mailto:abhishek.dod...@gmail.com]
Sent: 26 September 2012 22:27
To: user@hive.apache.org
Cc: <user@hive.apache.org>; <bejoy...@yahoo.com>
Subject: Re: Hive configuration property

 Hi Ashok,

Thanks for the reply, can you please tell me how many reducers should be 
considered using for 1 GB of intermediate data.

Regards
Abhi

Sent from my iPhone

On Sep 26, 2012, at 12:39 PM, 
<ashok.sa...@wipro.com<mailto:ashok.sa...@wipro.com>> wrote:
Yes Abshiek,
By setting below prop. You ll get better result. The no should depends on ur 
data size.

Regards
Ashok S.

From: Bejoy KS [mailto:bejoy...@yahoo.com]
Sent: 26 September 2012 21:04
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive configuration property

Hi Abshiek

Based on my experience you can always provide the number of reduce tasks 
(mapred.reduce.tasks) based on the data volume your query handles. It can yield 
you better performance numbers.

Regards,
Bejoy KS

________________________________
From: Abhishek <abhishek.dod...@gmail.com<mailto:abhishek.dod...@gmail.com>>
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Cc: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Sent: Wednesday, September 26, 2012 7:04 PM
Subject: Re: Hive configuration property



Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" 
property.

Regards
Abhi



Sent from my iPhone

On Sep 26, 2012, at 9:23 AM, bharath vissapragada 
<bharathvissapragada1...@gmail.com<mailto:bharathvissapragada1...@gmail.com>> 
wrote:

I'm no expert in hive, but here are my 2 cents.

By default hive schedules a reducer per every 1 GB of data ( change that value 
by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is 
huge, there will be large number of reducers, which might be unnecessary.( 
Sometimes large number of reducers slows down the job because their number 
exceeds total task slots and they keep waiting for their turn. Not to forget, 
the initialization overheads for each task..jvm etc.).

Overall, I think there cannot be any optimum values for a cluster. It depends 
on the type of queries, size of your inputs, size of map outputs in the jobs 
(intermediate outputs ). So you can can check various values and see which one 
is the best. From my experience setting "hive.exec.reducers.max" to total 
number of reduce slots in your cluster gives you a decent performance since all 
the reducers are completed in a single wave. (This may or maynot work for you, 
worth giving a try).


On Wed, Sep 26, 2012 at 5:58 PM, Abhishek 
<abhishek.dod...@gmail.com<mailto:abhishek.dod...@gmail.com>> wrote:

Hi all,

I have doubt regarding below properties, is it a good practice to override 
below properties in hive.

If yes, what is the optimal values for the following properties?

  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>

Regards
Abhi

Sent from my iPhone



--
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v


The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com

Reply via email to