Re: How to optimize a group by query

Bejoy KS Wed, 26 Sep 2012 10:43:01 -0700

Hi Abshiek

From the map reduce logs you can see whether the data processed by one reducer 
is much more than that of other reducers. Or in short one reducer takes 
relatively longer time complete compared to others.

Also to my previous mail, one more optimization is possible for group By if 
your table is bucketed or sorted bucketed. This optimization applies when the 
Group By columns are same as bucketed columns or the group by columns are a 
subset of sorted bucked columns. This optimization is enabled using 
'hive.optimize.groupby' which is true by default

Regards,
Bejoy KS

________________________________
 From: Abhishek <[email protected]>
To: "[email protected]" <[email protected]> 
Cc: "[email protected]" <[email protected]> 
Sent: Wednesday, September 26, 2012 10:59 PM
Subject: Re: How to optimize a group by query

Hi Bejoy,

Thanks for the reply, how can I know data skew among reducers.

Regards
Abhi

Sent from my iPhone

On Sep 26, 2012, at 1:20 PM, Bejoy KS <[email protected]> wrote:

Hi Abshiek
>
>
>Group by performance can be improved by the following
>1)enabling map side aggregation. In latest versions it is enabled by default
>SET hive.map.aggr = true;
>
>
>
>2)Is there a data skew observed in some of the reducers?
>If so a better performance can be yielded by setting the following property
>SET hive.groupby.skewindata=true;
>
>
> 
>
>Regards,
>Bejoy KS
>
>
>
>________________________________
> From: Abhishek <[email protected]>
>To: Hive <[email protected]> 
>Sent: Wednesday, September 26, 2012 10:31 PM
>Subject: How to optimize a group by query 
> 
>Hi all,
>
>I have written a query with group by clause, it is consuming lot of time is 
>there any way to optimize this any configuration property or some thing.
>
>Regards 
>Abhi
>
>
>Sent from my iPhone
>
>

Re: How to optimize a group by query

Reply via email to