Re: How to optimize a group by query

Abhishek Wed, 26 Sep 2012 10:58:46 -0700

Thanks bejoy.

Regards
Abhi


Sent from my iPhone

On Sep 26, 2012, at 1:42 PM, Bejoy KS <[email protected]> wrote:

> Hi Abshiek
> 
> From the map reduce logs you can see whether the data processed by one 
> reducer is much more than that of other reducers. Or in short one reducer 
> takes relatively longer time complete compared to others.
> 
> Also to my previous mail, one more optimization is possible for group By if 
> your table is bucketed or sorted bucketed. This optimization applies when the 
> Group By columns are same as bucketed columns or the group by columns are a 
> subset of sorted bucked columns. This optimization is enabled using 
> 'hive.optimize.groupby' which is true by default
>  
> Regards,
> Bejoy KS
> 
> From: Abhishek <[email protected]>
> To: "[email protected]" <[email protected]> 
> Cc: "[email protected]" <[email protected]> 
> Sent: Wednesday, September 26, 2012 10:59 PM
> Subject: Re: How to optimize a group by query
> 
> Hi Bejoy,
> 
> Thanks for the reply, how can I know data skew among reducers.
> 
> Regards
> Abhi
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 1:20 PM, Bejoy KS <[email protected]> wrote:
> 
>> Hi Abshiek
>> 
>> Group by performance can be improved by the following
>> 1)enabling map side aggregation. In latest versions it is enabled by default
>> SET hive.map.aggr = true;
>> 
>> 2)Is there a data skew observed in some of the reducers?
>> If so a better performance can be yielded by setting the following property
>> SET hive.groupby.skewindata=true;
>> 
>>  
>> Regards,
>> Bejoy KS
>> 
>> From: Abhishek <[email protected]>
>> To: Hive <[email protected]> 
>> Sent: Wednesday, September 26, 2012 10:31 PM
>> Subject: How to optimize a group by query 
>> 
>> Hi all,
>> 
>> I have written a query with group by clause, it is consuming lot of time is 
>> there any way to optimize this any configuration property or some thing.
>> 
>> Regards 
>> Abhi
>> 
>> 
>> Sent from my iPhone
> 
>

Re: How to optimize a group by query

Reply via email to