Re: Why there are two different stages on the same query when i use hive on spark.

Xuefu Zhang Thu, 03 Dec 2015 06:18:22 -0800

Can you also attach explain query result? What's your data format?

--Xuefu


On Thu, Dec 3, 2015 at 12:09 AM, Jone Zhang <[email protected]> wrote:

> Hive1.2.1 on Spark1.4.1
>
> *The first query is:*
> set mapred.reduce.tasks=100;
> use u_wsd;
> insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151202
> )
> select t1.uin,t1.clientip from
> (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151202)
> t1
> left outer join (select uin,clientip from t_sd_ucm_cominfo_FinalResult
> where ds=20151201) t2
> on t1.uin=t2.uin
> where t2.clientip is NULL;
>
> *The second query is:*
> set mapred.reduce.tasks=100;
> use u_wsd;
> insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151201
> )
> select t1.uin,t1.clientip from
> (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151201)
> t1
> left outer join (select uin,clientip from t_sd_ucm_cominfo_FinalResult
> where ds=20151130) t2
> on t1.uin=t2.uin
> where t2.clientip is NULL;
>
> *The attachment show the two query's stages.*
> *Here is the partition info*
> 104.3 M
>  /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151202
> 110.0 M
>  /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151201
> 112.6 M
>  /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151130
>
>
>
> *Why there are two different stages?*
> *The stage1 in first query is very slowly.*
>
> *Thanks.*
> *Best wishes.*
>

Re: Why there are two different stages on the same query when i use hive on spark.

Reply via email to