Re: Hive configuration property

Abhishek Wed, 26 Sep 2012 09:58:02 -0700

 Hi Ashok,

Thanks for the reply, can you please tell me how many reducers should be 
considered using for 1 GB of intermediate data.


Regards
Abhi

Sent from my iPhone

On Sep 26, 2012, at 12:39 PM, <ashok.sa...@wipro.com> wrote:

> Yes Abshiek,
> By setting below prop. You ll get better result. The no should depends on ur 
> data size.
>  
> Regards
> Ashok S.
>  
> From: Bejoy KS [mailto:bejoy...@yahoo.com] 
> Sent: 26 September 2012 21:04
> To: user@hive.apache.org
> Subject: Re: Hive configuration property
>  
> Hi Abshiek
>  
> Based on my experience you can always provide the number of reduce tasks 
> (mapred.reduce.tasks) based on the data volume your query handles. It can 
> yield you better performance numbers. 
>  
> Regards,
> Bejoy KS
>  
> From: Abhishek <abhishek.dod...@gmail.com>
> To: "user@hive.apache.org" <user@hive.apache.org> 
> Cc: "user@hive.apache.org" <user@hive.apache.org> 
> Sent: Wednesday, September 26, 2012 7:04 PM
> Subject: Re: Hive configuration property
> 
> 
> Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" 
> property.
>  
> Regards
> Abhi
>  
> 
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 9:23 AM, bharath vissapragada 
> <bharathvissapragada1...@gmail.com> wrote:
> 
>  
> I'm no expert in hive, but here are my 2 cents. 
>  
> By default hive schedules a reducer per every 1 GB of data ( change that 
> value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input 
> data is huge, there will be large number of reducers, which might be 
> unnecessary.( Sometimes large number of reducers slows down the job because 
> their number exceeds total task slots and they keep waiting for their turn. 
> Not to forget, the initialization overheads for each task..jvm etc.).
>  
> Overall, I think there cannot be any optimum values for a cluster. It depends 
> on the type of queries, size of your inputs, size of map outputs in the jobs 
> (intermediate outputs ). So you can can check various values and see which 
> one is the best. From my experience setting "hive.exec.reducers.max" to total 
> number of reduce slots in your cluster gives you a decent performance since 
> all the reducers are completed in a single wave. (This may or maynot work for 
> you, worth giving a try).
>  
>  
> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dod...@gmail.com> wrote:
>  
> Hi all,
>  
> I have doubt regarding below properties, is it a good practice to override 
> below properties in hive.
>  
> If yes, what is the optimal values for the following properties?
> 
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
>  
> Regards
> Abhi
> 
> Sent from my iPhone
> 
> 
>  
> -- 
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
>  
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
> 
> www.wipro.com

Re: Hive configuration property

Reply via email to