Thanks bejoy, I will try that. Regards Abhi
Sent from my iPhone On Sep 26, 2012, at 11:34 AM, Bejoy KS <bejoy...@yahoo.com> wrote: > Hi Abshiek > > Based on my experience you can always provide the number of reduce tasks > (mapred.reduce.tasks) based on the data volume your query handles. It can > yield you better performance numbers. > > Regards, > Bejoy KS > > From: Abhishek <abhishek.dod...@gmail.com> > To: "user@hive.apache.org" <user@hive.apache.org> > Cc: "user@hive.apache.org" <user@hive.apache.org> > Sent: Wednesday, September 26, 2012 7:04 PM > Subject: Re: Hive configuration property > > Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" > property. > > Regards > Abhi > > > > Sent from my iPhone > > On Sep 26, 2012, at 9:23 AM, bharath vissapragada > <bharathvissapragada1...@gmail.com> wrote: > >> >> I'm no expert in hive, but here are my 2 cents. >> >> By default hive schedules a reducer per every 1 GB of data ( change that >> value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input >> data is huge, there will be large number of reducers, which might be >> unnecessary.( Sometimes large number of reducers slows down the job because >> their number exceeds total task slots and they keep waiting for their turn. >> Not to forget, the initialization overheads for each task..jvm etc.). >> >> Overall, I think there cannot be any optimum values for a cluster. It >> depends on the type of queries, size of your inputs, size of map outputs in >> the jobs (intermediate outputs ). So you can can check various values and >> see which one is the best. From my experience setting >> "hive.exec.reducers.max" to total number of reduce slots in your cluster >> gives you a decent performance since all the reducers are completed in a >> single wave. (This may or maynot work for you, worth giving a try). >> >> >> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dod...@gmail.com> wrote: >> >> Hi all, >> >> I have doubt regarding below properties, is it a good practice to override >> below properties in hive. >> >> If yes, what is the optimal values for the following properties? >> >> set hive.exec.reducers.bytes.per.reducer=<number> >> In order to limit the maximum number of reducers: >> set hive.exec.reducers.max=<number> >> In order to set a constant number of reducers: >> set mapred.reduce.tasks=<number> >> >> Regards >> Abhi >> >> Sent from my iPhone >> >> >> >> -- >> Regards, >> Bharath .V >> w:http://researchweb.iiit.ac.in/~bharath.v > >