my sql no group.
The sql cause the problem :
from dw.fct_traffic_navpage_path_detl t
left outer join dw.univ_parnt_tranx_comb_detl o
on t.ordr_code = o.parnt_ordr_code
and t.cart_prod_id = o.comb_prod_id
and o.ds = '{$label}'
select ordr_code,count(*) as a from dw.fct_traffic_navpage_path_detl where ds
= '2015-05-10' group by ordr_code having a>10000 ;
151722135
fct_traffic_navpage_path_detl
select cart_prod_id,count(*) as a from dw.fct_traffic_navpage_path_detl where
ds = '2015-05-10' group by cart_prod_id having a>10000 ;
NULL 127233335
[email protected]
From: Jitendra Yadav
Date: 2015-05-11 16:25
To: user
Subject: Re: skewjoin problem
May be your one reducer is overloaded due to groupby keys. If you are using
groupby then try below property and see if reducer data distributed.
set hive.groupby.skewindata=true;
Thanks
Jitendra
On Mon, May 11, 2015 at 12:35 PM, [email protected] <[email protected]> wrote:
Status: Running (Executing on YARN cluster with App id
application_1419300485749_1493279)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 200 200 0 0 0 0
Map 4 .......... SUCCEEDED 3 3 0 0 0 0
Map 5 .......... SUCCEEDED 152 152 0 0 0 0
Reducer 2 ..... RUNNING 20 19 1 0 0 0
Reducer 3 RUNNING 23 0 23 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/05 [========================>>--] 93% ELAPSED TIME: 791.14 s
A reduce run for a long time.
I try set hive.exec.reducers.bytes.per.reducer = 4000000000;
set hive.skewjoin.key = 1000000000;
set hive.optimize.skewjoin =true;
but nothing helped. Only the reduce num decrease....
[email protected]