Re: Hive configuration property

Abhishek Wed, 26 Sep 2012 08:40:29 -0700

Thanks bejoy, I will try that.

Regards 
Abhi


Sent from my iPhone

On Sep 26, 2012, at 11:34 AM, Bejoy KS <bejoy...@yahoo.com> wrote:

> Hi Abshiek
> 
> Based on my experience you can always provide the number of reduce tasks 
> (mapred.reduce.tasks) based on the data volume your query handles. It can 
> yield you better performance numbers. 
>  
> Regards,
> Bejoy KS
> 
> From: Abhishek <abhishek.dod...@gmail.com>
> To: "user@hive.apache.org" <user@hive.apache.org> 
> Cc: "user@hive.apache.org" <user@hive.apache.org> 
> Sent: Wednesday, September 26, 2012 7:04 PM
> Subject: Re: Hive configuration property
> 
> Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" 
> property.
> 
> Regards
> Abhi
> 
> 
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 9:23 AM, bharath vissapragada 
> <bharathvissapragada1...@gmail.com> wrote:
> 
>> 
>> I'm no expert in hive, but here are my 2 cents. 
>> 
>> By default hive schedules a reducer per every 1 GB of data ( change that 
>> value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input 
>> data is huge, there will be large number of reducers, which might be 
>> unnecessary.( Sometimes large number of reducers slows down the job because 
>> their number exceeds total task slots and they keep waiting for their turn. 
>> Not to forget, the initialization overheads for each task..jvm etc.).
>> 
>> Overall, I think there cannot be any optimum values for a cluster. It 
>> depends on the type of queries, size of your inputs, size of map outputs in 
>> the jobs (intermediate outputs ). So you can can check various values and 
>> see which one is the best. From my experience setting 
>> "hive.exec.reducers.max" to total number of reduce slots in your cluster 
>> gives you a decent performance since all the reducers are completed in a 
>> single wave. (This may or maynot work for you, worth giving a try).
>> 
>> 
>> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dod...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I have doubt regarding below properties, is it a good practice to override 
>> below properties in hive.
>> 
>> If yes, what is the optimal values for the following properties?
>> 
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=<number>
>> 
>> Regards
>> Abhi
>> 
>> Sent from my iPhone
>> 
>> 
>> 
>> -- 
>> Regards,
>> Bharath .V
>> w:http://researchweb.iiit.ac.in/~bharath.v
> 
>

Re: Hive configuration property

Reply via email to