Re: Re: skewjoin problem

[email protected] Mon, 11 May 2015 01:02:13 -0700

my sql no group.

The sql cause the problem :


 from dw.fct_traffic_navpage_path_detl t 
left outer join dw.univ_parnt_tranx_comb_detl o 
on t.ordr_code = o.parnt_ordr_code 
and t.cart_prod_id = o.comb_prod_id 
and o.ds = '{$label}'

select ordr_code,count(*) as a from dw.fct_traffic_navpage_path_detl   where ds 
= '2015-05-10'  group by ordr_code having a>10000 ;
        151722135
       fct_traffic_navpage_path_detl

select cart_prod_id,count(*) as a from dw.fct_traffic_navpage_path_detl   where 
ds = '2015-05-10'  group by cart_prod_id having a>10000 ;

NULL    127233335



[email protected]
 
From: Jitendra Yadav
Date: 2015-05-11 16:25
To: user
Subject: Re: skewjoin problem
May be your one reducer is overloaded due to groupby keys. If you are using 
groupby then try below property and see if reducer data distributed.

set hive.groupby.skewindata=true;

Thanks
Jitendra

On Mon, May 11, 2015 at 12:35 PM, [email protected] <[email protected]> wrote:
Status: Running (Executing on YARN cluster with App id 
application_1419300485749_1493279) 

--------------------------------------------------------------------------------
 
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
--------------------------------------------------------------------------------
 
Map 1 .......... SUCCEEDED 200 200 0 0 0 0 
Map 4 .......... SUCCEEDED 3 3 0 0 0 0 
Map 5 .......... SUCCEEDED 152 152 0 0 0 0 
Reducer 2 ..... RUNNING 20 19 1 0 0 0 
Reducer 3 RUNNING 23 0 23 0 0 0 
--------------------------------------------------------------------------------
 
VERTICES: 03/05 [========================>>--] 93% ELAPSED TIME: 791.14 s   

A reduce run for a long time.

I try set hive.exec.reducers.bytes.per.reducer = 4000000000; 
set hive.skewjoin.key = 1000000000;
set hive.optimize.skewjoin =true;
but nothing helped. Only the reduce num decrease....



[email protected]

Re: Re: skewjoin problem

Reply via email to