Hi Sanjiv, Normally when it comes to this, I will try to find the section of the code which cause the largest lag
SELECT > sb_gu_key, m_d_key, t_ev_st_dt, > LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY > t_ev_st_dt ) AS LAG_START_DT, > a_z_key, > c_dt, > e_p_dt, > sq_nbr, > CASE WHEN LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key > ORDER BY t_ev_st_dt ) IS NULL OR a_z_key <> LAG( a_z_key , 1 , -999 ) > OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt ) THEN 'S' > ELSE NULL END AS ST_FLAG > FROM `PRDDB`.tuning_dd_key ; >From the above query which part is the most time consuming? For example is the LAG function the most consuming section that takers the lion's hare of the query? Just execute the code and comment out LAG(t_ev_st_td) ..... first I suspect CASE WHEN LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt ) IS NULL OR a_z_key <> LAG( a_z_key , 1 , -999 ) OVER ( PARTITION BY is the other possible candidate as well with that OR than can cause the issue For example you can do the following to measure the timing select from_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') AS StartTime; SELECT COUNT(1) FROM PRDDB`.tuning_dd_key WHERE (LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt ) IS NULL) OR a_z_key <> LAG( a_z_key , 1 , -999 ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt ) select from_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') AS EndTime; HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 24 June 2016 at 22:34, @Sanjiv Singh <sanjiv.is...@gmail.com> wrote: > Hi Vijay, > > Please help me on this....let me know you need other info. > > > > Regards > Sanjiv Singh > Mob : +091 9990-447-339 > > On Thu, Jun 23, 2016 at 12:41 PM, @Sanjiv Singh <sanjiv.is...@gmail.com> > wrote: > >> Hi Gopal, >> >> I am using Tez as execution engine. >> >> DAG : >> >> +--------------------------------------------------------+--+ >> | >> Explain >> | >> +---------------------------------------------------------+--+ >> | Plan not optimized by CBO. >> | >> | >> | >> | Vertex dependency in root stage >> | >> | Reducer 2 <- Map 1 (SIMPLE_EDGE) >> | >> | >> | >> | Stage-0 >> | >> | Fetch Operator >> | >> | limit:-1 >> | >> | Stage-1 >> | >> | Reducer 2 >> | >> | File Output Operator [FS_55596] >> | >> | compressed:false >> | >> | Statistics:Num rows: 6357592675 Data size: 54076899328 >> Basic stats: COMPLETE Column stats: NONE | >> | >> table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input >> format:":"org.apache.hadoop.mapred.TextInputFormat","output >> format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"} | >> | Select Operator [SEL_55594] >> | >> | >> >> outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] >> | >> | Statistics:Num rows: 6357592675 Data size: 54076899328 >> Basic stats: COMPLETE Column stats: NONE | >> | PTF Operator [PTF_55593] >> | >> | Function definitions:[{"Input >> definition":{"type:":"WINDOWING"}},{"partition by:":"_col0, >> _col1","name:":"windowingtablefunction","order by:":"_col2"}] | >> | Statistics:Num rows: 6357592675 Data size: >> 54076899328 Basic stats: COMPLETE Column stats: NONE | >> | Select Operator [SEL_55592] >> | >> | | >> outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"] >> | >> | | Statistics:Num rows: 6357592675 Data size: >> 54076899328 Basic stats: COMPLETE Column stats: NONE | >> | |<-Map 1 [SIMPLE_EDGE] vectorized >> | >> | Reduce Output Operator [RS_55597] >> | >> | key expressions:m_d_key (type: smallint), >> sb_gu_key (type: bigint), t_ev_st_dt (type: date) | >> | Map-reduce partition columns:m_d_key (type: >> smallint), sb_gu_key (type: bigint) | >> | sort order:+++ >> | >> | Statistics:Num rows: 6357592675 Data size: >> 54076899328 Basic stats: COMPLETE Column stats: NONE >> | >> | value expressions:ad_zn_key (type: int), c_dt >> (type: date), e_p_dt (type: date), sq_nbr (type: int) | >> | TableScan [TS_55590] >> | >> | ACID table:true >> | >> | alias:tuning_dd_key >> | >> | Statistics:Num rows: 6357592675 Data size: >> 54076899328 Basic stats: COMPLETE Column stats: NONE | >> | >> >> | >> >> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ >> >> Regards >> Sanjiv Singh >> Mob : +091 9990-447-339 >> >> On Thu, Jun 23, 2016 at 2:45 AM, Gopal Vijayaraghavan <gop...@apache.org> >> wrote: >> >>> >>> > Long running query : >>> >>> Are you running this on MapReduce or Tez? >>> >>> Please post the output of explain - if you are seeing > 1 shuffle edge in >>> your query while having only one window for OVER(), that might be the >>> reason. >>> >>> OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt) >>> >>> >>> The multiple PTF operators should have been collapsed by the reduce >>> sink-deduplication. >>> >>> Cheers, >>> Gopal >>> >>> >>> >> >