Re: skewjoin problem

2015-05-11 Thread Jitendra Yadav
May be your one reducer is overloaded due to groupby keys. If you are using groupby then try below property and see if reducer data distributed. set hive.groupby.skewindata=true; Thanks Jitendra On Mon, May 11, 2015 at 12:35 PM, r7raul1...@163.com r7raul1...@163.com wrote: Status: Running

skewjoin problem

2015-05-11 Thread r7raul1...@163.com
Status: Running (Executing on YARN cluster with App id application_1419300485749_1493279) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

Re: Re: skewjoin problem

2015-05-11 Thread r7raul1...@163.com
my sql no group. The sql cause the problem : from dw.fct_traffic_navpage_path_detl t left outer join dw.univ_parnt_tranx_comb_detl o on t.ordr_code = o.parnt_ordr_code and t.cart_prod_id = o.comb_prod_id and o.ds = '{$label}' select ordr_code,count(*) as a from

Re: Re: hive sql on tez run forever

2015-05-11 Thread r7raul1...@163.com
I see only 1 reduce run forerver. Skew join? r7raul1...@163.com From: Eugene Koifman Date: 2015-05-12 01:43 To: user CC: r7raul1...@163.com Subject: Re: hive sql on tez run forever This isn’t a valid rewrite. if a(x,y) has 1 row (1,2) and b(x,z) has 1 row (1,1) then the 1st query will

Storage Based Authorization

2015-05-11 Thread Udit Mehta
Hi, I have enabled storage based authorization in the hive metastore by adding the following configs to hive-site: property namehive.security.authorization.enabled/name valuetrue/value /property property namehive.security.authorization.manager/name

Re: Question about bushy join in hive CBO

2015-05-11 Thread Ashutosh Chauhan
Hi Rossi, Historically, we used LoptOptimizeJoinRule of Calcite to do join reordering. This does a greedy search on join order search space to find a join order which is atleast as good as original join order of query. Goodness being in term of estimated cost and not globally optimal because of

Re: Question about bushy join in hive CBO

2015-05-11 Thread Ruoxi Sun
Thank you, Ashutosh. That's very informative. I appreciate that! *Rossi* 2015-05-12 9:08 GMT+08:00 Ashutosh Chauhan hashut...@apache.org: Hi Rossi, Historically, we used LoptOptimizeJoinRule of Calcite to do join reordering. This does a greedy search on join order search space to find a

Re: hive sql on tez run forever

2015-05-11 Thread Gopal Vijayaraghavan
Hi, You’re correct - that is not a valid rewrite. Both tables have to be shuffled across due to the OR clause with no reductions. Cheers, Gopal On 5/11/15, 10:43 AM, Eugene Koifman ekoif...@hortonworks.com wrote: This isn’t a valid rewrite. if a(x,y) has 1 row (1,2) and b(x,z) has 1 row (1,1)

hive on tez not convert map join to broadcast join

2015-05-11 Thread r7raul1...@163.com
In MR query plan is Map Join Operator condition map: Left Outer Join0 to 1 keys: 0 ordr_code (type: string), cart_prod_id (type: bigint) 1 parnt_ordr_code (type: string), comb_prod_id (type: bigint) outputColumnNames: _col1, _col2, _col3, _col5, _col10, _col11, _col15, _col16, But in tez

Re: hive sql on tez run forever

2015-05-11 Thread Gopal Vijayaraghavan
Hi, I change the sql where condition to (where t.update_time = '2015-05-04') , the sql can return result for a while. Because t.update_time = '2015-05-04' can filter many row when table scan. But why change where condition to (where t.update_time = '2015-05-04' or length(t8.end_user_id)0)

Re: hive sql on tez run forever

2015-05-11 Thread Eugene Koifman
This isn’t a valid rewrite. if a(x,y) has 1 row (1,2) and b(x,z) has 1 row (1,1) then the 1st query will produce 1 row but the 2nd query with subselects will not. On 5/11/15, 10:13 AM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, I change the sql where condition to (where t.update_time =

RE: hive sql on tez run forever

2015-05-11 Thread Mich Talebzadeh
The other option is to try UNION ALL or UNION depending on the nature of the result set SELECT rs.col1, rs,col2 , … FROM ( SELECT t.col1, t.col2, .. FROM t WHERE t.update_time '2015-05-04' UNION ALL SELECT t8.col1, t8.col2,.. FROM t8 WHERE length(t8.end_user_id) 0 ) rs