Hi Ashok, Thanks for the reply, can you please tell me how many reducers should be considered using for 1 GB of intermediate data.
Regards Abhi Sent from my iPhone On Sep 26, 2012, at 12:39 PM, <ashok.sa...@wipro.com> wrote: > Yes Abshiek, > By setting below prop. You ll get better result. The no should depends on ur > data size. > > Regards > Ashok S. > > From: Bejoy KS [mailto:bejoy...@yahoo.com] > Sent: 26 September 2012 21:04 > To: user@hive.apache.org > Subject: Re: Hive configuration property > > Hi Abshiek > > Based on my experience you can always provide the number of reduce tasks > (mapred.reduce.tasks) based on the data volume your query handles. It can > yield you better performance numbers. > > Regards, > Bejoy KS > > From: Abhishek <abhishek.dod...@gmail.com> > To: "user@hive.apache.org" <user@hive.apache.org> > Cc: "user@hive.apache.org" <user@hive.apache.org> > Sent: Wednesday, September 26, 2012 7:04 PM > Subject: Re: Hive configuration property > > > Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" > property. > > Regards > Abhi > > > > Sent from my iPhone > > On Sep 26, 2012, at 9:23 AM, bharath vissapragada > <bharathvissapragada1...@gmail.com> wrote: > > > I'm no expert in hive, but here are my 2 cents. > > By default hive schedules a reducer per every 1 GB of data ( change that > value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input > data is huge, there will be large number of reducers, which might be > unnecessary.( Sometimes large number of reducers slows down the job because > their number exceeds total task slots and they keep waiting for their turn. > Not to forget, the initialization overheads for each task..jvm etc.). > > Overall, I think there cannot be any optimum values for a cluster. It depends > on the type of queries, size of your inputs, size of map outputs in the jobs > (intermediate outputs ). So you can can check various values and see which > one is the best. From my experience setting "hive.exec.reducers.max" to total > number of reduce slots in your cluster gives you a decent performance since > all the reducers are completed in a single wave. (This may or maynot work for > you, worth giving a try). > > > On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dod...@gmail.com> wrote: > > Hi all, > > I have doubt regarding below properties, is it a good practice to override > below properties in hive. > > If yes, what is the optimal values for the following properties? > > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > > Regards > Abhi > > Sent from my iPhone > > > > -- > Regards, > Bharath .V > w:http://researchweb.iiit.ac.in/~bharath.v > > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > www.wipro.com