Correction for item 1: Change "FAA" => "FCD1".
On Tue, Apr 21, 2015 at 3:11 PM, Hao Zhu <[email protected]> wrote: > Hi Team, > > Besides the 300+ seconds planning time, here are the performance > differences in execution phase also. > > In general, this SQL contains 2 fact table joins, and with another 10+ > dimension table joins. > The 2 fact tables are: > > - fact_agent_activity_detail_12m_partparq AS FAA > - fact_contact_detail_12m_partparq AS FCD1 > > So here are the 2 major time differences: > *1. PARQUET_ROW_GROUP_SCAN for FAA.* > Impala spent 3s248ms to read FAA. > Operator #Hosts Avg Time Max Time #Rows Est. #Rows > Peak Mem Est. Peak Mem Detail > 15:HASH JOIN 4 2s728ms 2s774ms 145.81K 89.61K > 531.03 MB 2.68 KB RIGHT OUTER JOIN, PARTITIONED > |--30:EXCHANGE 4 3.135ms 12.277ms 145.81K 192 > 0 0 HASH(FAA.Data_Source_Key,FA... > | 00:SCAN HDFS 1 3s248ms 3s248ms 145.81K 192 > 107.93 MB 880.00 MB default.fact_agent_activity... > 29:EXCHANGE 4 9.8ms 9.826ms 399.94K 89.61K > 0 0 HASH(FCD1.data_source_key,F... > 01:SCAN HDFS 4 2s648ms 3s452ms 399.94K 89.61K > 55.96 MB 960.00 MB default.fact_contact_detail... > > Drill spent 26.934 seconds(Avg Process time) to read FAA. > 02-xx-05 PARQUET_ROW_GROUP_SCAN 26.934 > 02-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_0_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_10_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_11_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_1_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_2_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_3_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_4_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_5_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_6_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_7_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_8_0.parquet], > ReadEntryWithPath [path=hdfs:// > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_9_0.parquet] > <http://10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/12/1_9_0.parquet%5D>], > selectionRoot=/user/root/fact_contact_detail_12m_partparq, numFiles=12, > columns=[`*`]]]) : rowType = (DrillRecordRow[*, Call_Start_Date_Time, dir0, > dir1]): rowcount = 8960530.0, cumulative cost = {8960530.0 rows, 3.584212E7 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 606703 > > Could you compare the performance of below 2 SQLs: > * Impala:* > select count(*) from > (select * from fact_contact_detail_12m_partparq AS FCD1 where > Call_Start_Date_Time between '2014-12-03 00:00:00' AND '2014-12-03 > 13:00:00' AND FCD1.Year_key = 20140000 and FCD1.month_key = 20141200) tmp; > * Drill:* > select count(*) from > (select * from fact_contact_detail_12m_partparq AS FCD1 where > Call_Start_Date_Time between '2014-12-03 00:00:00' AND '2014-12-03 > 13:00:00' AND FCD1.dir0 = 2014 and FCD1.dir1 = 12) tmp; > > *2. Drill is using broadcast join for the 2 fact table while impala is > using shuffle join.* > > Impala's shuffle join for 2 fact tables took 2s728ms. > 15:HASH JOIN 4 2s728ms 2s774ms 145.81K 89.61K > 531.03 MB 2.68 KB RIGHT OUTER JOIN, PARTITIONED > > Drill's broadcast join for 2 fact tables caused 16.035 seconds avg waiting > time. > 01-xx-46 UNORDERED_RECEIVER 16.035 > > 01-46 BroadcastExchange : rowType = RecordType(ANY $f1, ANY $f2, ANY > $f3, ANY ITEM, ANY ITEM4, ANY ITEM5, ANY ITEM6, ANY ITEM7, ANY ITEM8, ANY > ITEM9, ANY ITEM10, ANY ITEM11, ANY ITEM12, ANY ITEM13, ANY ITEM14, ANY > ITEM15): rowcount = 50402.98125, cumulative cost = {2.7032798943749998E7 > rows, 1.0056522825625E8 cpu, 0.0 io, 1.32128391168E10 network, 0.0 > memory}, id = 606708 > > The reason why Drill chose broadcast is because it estimates row count is > only 50K(Below planner.broadcast_threshold, default 1000K). However the > actual row count to broadcast is much higher than estimation. > > Needs to check why the estimation is not that accurate in this case. > > Thanks, > Hao > > On Tue, Apr 21, 2015 at 2:32 PM, Jinfeng Ni <[email protected]> wrote: > >> The query you used is a 15-table join query. We know that Drill's cost >> based optimizer will see performance overhead increase significantly with >> increased # of tables joined, due to the increased search space. I'm not >> surprised to see that you had 306.812 seconds for planning for such 15 >> table join. >> >> We plan to implement a heuristic planner in Drill, which will provide >> another option, for large # of table join query. Heuristic planner should >> help in this case. >> >> >> On Tue, Apr 21, 2015 at 1:48 PM, Sivasubramaniam, Latha < >> [email protected]> wrote: >> >> > Hao, >> > >> > I have copied both query profiles to the link >> > >> https://drive.google.com/folderview?id=0ByB1-EsAGxA8fkRCSDhVcGtQS0NGTjRuSlpGelVLMkxIVVFxYXMtU2JtQ3FaN2t3UTZpUUE&usp=sharing >> > >> > Please let me know if you cannot access. >> > >> > Appreciate any help. >> > >> > Thanks, >> > Latha >> > >> > -----Original Message----- >> > From: Hao Zhu [mailto:[email protected]] >> > Sent: Tuesday, April 21, 2015 11:31 AM >> > To: [email protected] >> > Subject: Re: Query performance Comparison - Drill and Impala >> > >> > Hi Latha, >> > >> > Do you have the complete SQL profiles for the same query on impala and >> > Drill? >> > For Impala, you can run "profile;" command after the SQL finished. >> > For Drill, you can go to the web GUI, and copy/paste the complete "Full >> > JSON Profile". >> > >> > I want to compare what is the major performance difference between them >> > two. >> > >> > Thanks, >> > Hao >> > >> > On Tue, Apr 21, 2015 at 10:43 AM, Sivasubramaniam, Latha < >> > [email protected]> wrote: >> > >> > > Neeraja, I removed some columns from select list and shortened aliases >> > > to get the query manageable and got the explain plan working finally. >> > > The errors were really random. >> > > >> > > Jacques, >> > > >> > > Most of it seems to be Hashjoin and yes the planning time took 306 >> > > seconds. Is there a way to improve the query performance.? Below is >> > > the snapshot of what I see in the query plan and I have copied the >> > > query also to the end of the email. >> > > >> > > >> > > Resource_Group_Desc=[$7], Call_Action_Desc=[$20], ITEM7=[$9], >> > > ITEM8=[$10], ITEM9=[$11], ITEM10=[$12], ITEM11=[$13], ITEM12=[$14], >> > > ITEM13=[$15], ITEM0=[$16], ITEM14=[$17], ITEM15=[$18]) >> > > 01-29 >> > > HashJoin(condition=[=($19, $8)], joinType=[left]) >> > > 01-31 >> > > Project(Call_Id_Source=[$0], Agent_Key=[$1], Site_Key=[$2], >> > > Workgroup_Key=[$3], Status_Key=[$4], Reason_Key=[$5], >> > > Call_Type_Desc=[$6], Resource_Group_Desc=[$20], ITEM6=[$8], >> > > ITEM7=[$9], ITEM8=[$10], ITEM9=[$11], ITEM10=[$12], ITEM11=[$13], >> > > ITEM12=[$14], ITEM13=[$15], ITEM0=[$16], ITEM14=[$17], ITEM15=[$18]) >> > > 01-32 >> > > HashJoin(condition=[=($7, $19)], joinType=[left]) >> > > 01-34 >> > > Project(Call_Id_Source=[$0], Agent_Key=[$1], Site_Key=[$2], >> > > Workgroup_Key=[$3], Status_Key=[$4], Reason_Key=[$5], >> > > Call_Type_Desc=[$20], ITEM5=[$7], ITEM6=[$8], ITEM7=[$9], ITEM8=[$10], >> > > ITEM9=[$11], ITEM10=[$12], ITEM11=[$13], ITEM12=[$14], ITEM13=[$15], >> > > ITEM0=[$16], ITEM14=[$17], >> > > ITEM15=[$18]) >> > > 01-35 >> > > HashJoin(condition=[=($6, $19)], joinType=[left]) >> > > 01-37 >> > > Project(Call_Id_Source=[$0], Agent_Key=[$1], Site_Key=[$2], >> > > Workgroup_Key=[$3], Status_Key=[$4], Reason_Key=[$5], ITEM4=[$7], >> > > ITEM5=[$8], ITEM6=[$9], ITEM7=[$10], ITEM8=[$11], ITEM9=[$12], >> > > ITEM10=[$13], ITEM11=[$14], ITEM12=[$15], ITEM13=[$16], ITEM0=[$17], >> > > ITEM14=[$18], ITEM15=[$19]) >> > > 01-38 >> > > HashJoin(condition=[=($6, $20)], joinType=[left]) >> > > 01-40 >> > > Project(Call_Id_Source=[$0], Agent_Key=[$2], Site_Key=[$3], >> > > Workgroup_Key=[$4], Status_Key=[$5], Reason_Key=[$6], ITEM=[$7], >> > > ITEM4=[$8], ITEM5=[$9], ITEM6=[$10], ITEM7=[$11], ITEM8=[$12], >> > > ITEM9=[$13], ITEM10=[$14], ITEM11=[$15], ITEM12=[$16], ITEM13=[$17], >> > > ITEM0=[$7], ITEM14=[$18], ITEM15=[$19]) >> > > 01-41 >> > > HashJoin(condition=[=($20, $1)], joinType=[left]) >> > > 01-43 >> > > Project(Call_Id_Source=[$1], Service_Key=[$3], Agent_Key=[$4], >> > > Site_Key=[$5], Workgroup_Key=[$6], Status_Key=[$7], Reason_Key=[$8], >> > > ITEM=[$15], ITEM4=[$16], ITEM5=[$17], ITEM6=[$18], ITEM7=[$19], >> > > ITEM8=[$20], ITEM9=[$21], ITEM10=[$22], ITEM11=[$23], ITEM12=[$24], >> > > ITEM13=[$25], ITEM14=[$26], ITEM15=[$27]) >> > > 01-45 >> > > HashJoin(condition=[AND(=($0, $12), =($1, $13), =($2, $14))], >> > > joinType=[left]) >> > > 01-47 >> > > SelectionVectorRemover >> > > 01-48 >> > > Filter(condition=[AND(>=($9, '2014-12-03 00:00:00'), <=($9, >> > > '2014-12-03 13:00:00'), =($10, 2014), =($11, 12))]) >> > > 01-49 >> > > Project(Data_Source_Key=[$9], Call_Id_Source=[$4], >> > > Sequence_Number=[$8], Service_Key=[$11], Agent_Key=[$10], >> > > Site_Key=[$2], Workgroup_Key=[$5], Status_Key=[$3], Reason_Key=[$7], >> > > Start_Date_Time=[$6], dir0=[$1], >> > > dir1=[$0]) >> > > 01-50 >> > > Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath >> > > [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_0_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_1_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_2_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_3_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_4_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_5_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_6_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_7_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_8_0.parquet], >> > > ReadEntryWithPath [path=hdfs:// >> > > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partparq/2 >> > > 014/12/1_9_0.parquet]], >> > > selectionRoot=/user/root/fact_agent_activity_detail_12m_partparq, >> > > numFiles=10, columns=[`Data_Source_Key`, `Call_Id_Source`, >> > > `Sequence_Number`, `Service_Key`, `Agent_Key`, `Site_Key`, >> > > `Workgroup_Key`, `Status_Key`, `Reason_Key`, `Start_Date_Time`, >> > > `dir0`, `dir1`]]]) >> > > 01-46 >> > > BroadcastExchange >> > > 02-01 >> > > Project($f1=[ITEM($0, 'Data_Source_Key')], $f2=[ITEM($0, >> > > 'Call_Id_Source')], $f3=[ITEM($0, 'Sequence_Number')], ITEM=[ITEM($0, >> > > 'Service_Key')], ITEM4=[ITEM($0, 'Call_Type_Key')], ITEM5=[ITEM($0, >> > > 'Resource_Group_Key')], ITEM6=[ITEM($0, 'Call_Action_Key')], >> > > ITEM7=[ITEM($0, 'Agent_Disp_Key')], ITEM8=[ITEM($0, >> > > 'Second_Party_Agent_Key')], ITEM9=[ITEM($0, >> > > 'Call_Action_Reason_Key')], ITEM10=[ITEM($0, 'ANI')], ITEM11=[ITEM($0, >> > > 'DNIS')], ITEM12=[ITEM($0, 'Begin_Date_Time')], ITEM13=[ITEM($0, >> > > 'Sequence_Number')], ITEM14=[ITEM($0, 'Dur_Short_Call')], >> > > ITEM15=[ITEM($0, 'Call_Category_Key')]) >> > > 02-02 >> > > SelectionVectorRemover >> > > 02-03 >> > > Filter(condition=[AND(>=($1, '2014-12-03 00:00:00'), <=($1, >> > > '2014-12-03 13:00:00'), =($2, 2014), =($3, 12))]) >> > > 02-04 >> > > Project(T8¦¦*=[$0], Call_Start_Da | >> > > +------------+------------+ >> > > 1 row selected (306.812 seconds) >> > > 0: jdbc:drill:zk=rtr-poc-imp1:2181> >> > > >> > > >> > > ************ Query ******************** SELECT DSQ.Site_Desc, >> > > DWAE.Workgroup_Desc, DASAE.Agent_Status_Key, DCTCD.Call_Type_Desc, >> > > DCA.Call_Action_Desc, FCD.ANI, FCD.DNIS, FCD.Begin_Date_Time, >> > > DD.Disposition_Desc, DAAE.Full_Name , >> > > >> > > DRGAE.Resource_Group_Desc, FCD.Sequence_Number, FCD.Service_Key, >> > > FCD.Dur_Short_Call, FAA.Call_Id_Source, FCD.Call_Category_Key FROM >> > > fact_agent_activity_detail_12m_partparq AS FAA LEFT OUTER JOIN (select >> > > * from fact_contact_detail_12m_partparq AS FCD1 where >> > > Call_Start_Date_Time between '2014-12-03 00:00:00' AND '2014-12-03 >> > > 13:00:00' AND FCD1.dir0 = 2014 and FCD1.dir1 = 12 >> > > >> > > ) AS FCD >> > > ON FAA.Data_Source_Key = FCD.Data_Source_Key AND FAA.Call_Id_Source = >> > > FCD.Call_Id_Source AND FAA.Sequence_Number = FCD.Sequence_Number LEFT >> > > OUTER JOIN DIM_Services_Parq AS DSAAD ON DSAAD.Service_Key = >> > > FAA.Service_Key LEFT OUTER JOIN DIM_Services_Parq AS DSCD ON >> > > FCD.Service_Key = DSCD.Service_Key LEFT OUTER JOIN DIM_Call_Types_Parq >> > > AS DCTCD ON FCD.Call_Type_Key = DCTCD.Call_Type_Key LEFT OUTER JOIN >> > > DIM_Resource_Groups_parq AS DRGAE ON FCD.Resource_Group_Key = >> > > DRGAE.Resource_Group_Key LEFT OUTER JOIN DIM_Call_Actions_parq AS DCA >> > > ON DCA.Call_Action_Key = FCD.Call_Action_Key INNER JOIN >> > > DIM_Agents_Parquet AS DAAE ON FAA.Agent_Key = DAAE.Agent_Key INNER >> > > JOIN DIM_Sites_Parq AS DSQ ON FAA.Site_Key = DSQ.Site_Key INNER JOIN >> > > DIM_Workgroups_parq AS DWAE ON FAA.Workgroup_Key = DWAE.Workgroup_Key >> > > LEFT OUTER JOIN DIM_Dispositions_parq AS DD ON FCD.Agent_Disp_Key = >> > > DD.Disposion_Key >> > > LEFT OUTER JOIN DIM_Agents_Parquet AS DACD ON >> > > FCD.Second_Party_Agent_Key = DACD.Agent_Key LEFT OUTER JOIN >> > > DIM_Call_Action_Reasons_parq AS DAAR ON FCD.Call_Action_Reason_Key = >> > > DAAR.Call_Action_Reason_Key LEFT OUTER JOIN DIM_Agent_Status_parq AS >> > > DASAE ON FAA.Status_Key = DASAE.Agent_Status_Key LEFT OUTER JOIN >> > > DIM_Agent_Status_Reasons_parq AS DARAE ON FAA.Reason_Key = >> > > DARAE.Agent_Status_Reason_Key WHERE (DWAE.Workgroup_Key IN ( 3,1) >> > > >> > > AND >> > > (FAA.Start_Date_Time between '2014-12-03 00:00:00' AND '2014-12-03 >> > > 13:00:00' AND FAA.dir0 = 2014 and FAA.dir1 = 12)); >> > > >> > > >> > > Appreciate your help. >> > > >> > > Thanks, >> > > Latha >> > > >> > > >> > > >> > > >> > > >> > > -----Original Message----- >> > > From: Neeraja Rentachintala [mailto:[email protected]] >> > > Sent: Monday, April 20, 2015 9:22 PM >> > > To: [email protected] >> > > Subject: Re: Query performance Comparison - Drill and Impala >> > > >> > > Latha >> > > What error are you getting when doing explain plan. >> > > >> > > On Mon, Apr 20, 2015 at 4:40 PM, Sivasubramaniam, Latha < >> > > [email protected]> wrote: >> > > >> > > > I am unable to do an explain plan for this query. Is there a way to >> > > > pass a sql file for the Explain plan command. Query runs fine from >> > > > the file with the !run command though. Will try to fix the errors >> > > > and see if I can get the query plan. >> > > > >> > > > Thanks, >> > > > latha >> > > > >> > > > -----Original Message----- >> > > > From: Jacques Nadeau [mailto:[email protected]] >> > > > Sent: Monday, April 20, 2015 4:12 PM >> > > > To: [email protected] >> > > > Subject: Re: Query performance Comparison - Drill and Impala >> > > > >> > > > Can you confirm you are getting broadcast joins for all the >> > > > dimension tables? Additionally, can you confirm the planning time? >> > > > I'm guessing both might be an issue given the times you are seeing. >> > > > On Apr 20, 2015 6:46 PM, "Sivasubramaniam, Latha" < >> > > > [email protected]> wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > I am running performance tests for our use cases on Impala and >> > > > > Drill to make a choice and pick one for our Data warehouse >> > > > > implementation on top of Hadoop. I used the same cluster for both >> > the tests. >> > > > > Select Query that takes ~20 seconds in impala, takes around 440 >> > > > > seconds in drill on same hardware cluster (nearly 20 times more). >> > > > > I would like to make sure that I am not missing anything in tuning >> > > > > or >> > > configuration. >> > > > > Below is our configuration, appreciate comments on if these >> > > > > performance numbers are as expected or if any tuning can be done >> > > > > to >> > > > improve drill performance. >> > > > > >> > > > > Hadoop Cluster configuration and data set: >> > > > > >> > > > > >> > > > > * 5 nodes cluster, 1 name node and 4 data nodes, 32GB >> memory >> > , >> > > 4 >> > > > > cpus on each node. >> > > > > >> > > > > * There are two main fact tables. 98 million records in >> one >> > and >> > > > 87 >> > > > > million records in another. There are around 10 dimension tables. >> > > > > For now, we are running the tests with normalized schema from out >> > > > > data >> > > > warehouse. >> > > > > >> > > > > * Running the same query against Impala and Drill except >> for >> > > the >> > > > > syntax differences everything is same. >> > > > > >> > > > > Impala Configuration >> > > > > >> > > > > >> > > > > * CDH 5.3.2 with Impala 2.1.2 version >> > > > > >> > > > > * 4 impala daemons running on 4 data nodes. >> > > > > >> > > > > * Impala daemons memory set to 25GB. >> > > > > >> > > > > * Partitioned parquet tables were created from text format >> > hdfs >> > > > > files using impala itself (CTAS command). >> > > > > >> > > > > * Parquet block size 256MB >> > > > > >> > > > > Drill Configuration >> > > > > >> > > > > >> > > > > * Drill 0.8 version, added drill on the CDH cluster setup >> > > > > >> > > > > * 4 drillbits running on 4 data nodes >> > > > > >> > > > > * Drillbits memory set to 16GB and heap memory set to >> 8GB. I >> > > ran >> > > > > with 8GB memory and 4GB for heap, there was no change in the query >> > > > timings. >> > > > > >> > > > > * Partitioned parquet files. >> > > > > >> > > > > SQL Query >> > > > > >> > > > > SELECT >> > > > > >> > > > > DIM_Sites_parq.Site_Desc, >> > > > > >> > > > > DIM_Workgroups_AgentEventDetails.Workgroup_Desc, >> > > > > DIM_Agents_AgentEventDetails.Last_Name, >> > > > > >> > > > > DIM_Agents_AgentEventDetails.First_Name, >> > > > > >> > > > > >> > > > > >> DIM_Agent_Status_AgentEventDetails.Agent_Status_Key, >> > > > > DIM_CallTypes_ContactDetails.Call_Type_Desc, >> > > > > >> > > > > >> > > > > >> > > > > DIM_Call_Actions.Call_Action_Desc, >> > > > > >> > > > > FACT_Contact_Detail.ANI, >> > > > > FACT_Contact_Detail.DNIS, >> > > > > >> > > > > FACT_Contact_Detail.Begin_Date_Time, >> > > > > FACT_Contact_Detail.Call_Start_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.Orig_DNIS, >> > > > > FACT_Agent_Activity_Detail.Start_Date_Time, >> > > > > FACT_Agent_Activity_Detail.End_Date_Time, >> > > > > FACT_Agent_Activity_Detail.Dur_Time, >> > > > > >> > > > > FACT_Agent_Activity_Detail.Sequence_Number, >> > > > > >> > > > > FACT_Contact_Detail.Begin_Date_Time, >> > > > > FACT_Contact_Detail.Call_Start_Date_Time, >> > > > > FACT_Agent_Activity_Detail.Start_Date_Time, >> > > > > FACT_Agent_Activity_Detail.End_Date_Time, >> > > > > >> > > > > >> > > > > DIM_Dispositions.Disposition_Desc, >> > > > > DIM_Agents_AgentEventDetails.Full_Name , >> > > > > >> > > > > >> > > > > DIM_Resource_Groups_AgentEventDetails.Resource_Group_Desc, >> > > > > >> > > > > FACT_Contact_Detail.Sequence_Number, >> > > > > >> > > > > FACT_Contact_Detail.Service_Key, >> > > > > FACT_Contact_Detail.Station, >> > > > > FACT_Contact_Detail.Near_Off_Hook_Date_Time, >> > > > > FACT_Contact_Detail.Far_Off_Hook_Date_Time, >> > > > > FACT_Contact_Detail.Near_On_Hook_Date_Time, >> > > > > FACT_Contact_Detail.Far_On_Hook_Date_Time, >> > > > > FACT_Contact_Detail.Begin_Greeting_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.End_Guard_Date_Time, >> > > > > FACT_Contact_Detail.Resource_Group_Type, >> > > > > FACT_Contact_Detail.Info_Digits, >> > > > > FACT_Contact_Detail.Agent_Site_Key, >> > > > > FACT_Contact_Detail.Agent_Switch_Key, >> > > > > >> > > > > FACT_Contact_Detail.Line_Number, >> > > > > FACT_Contact_Detail.Queue_Start_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.CallQ_Start_Date_Time, >> > > > > FACT_Contact_Detail.Queue_End_Date_Time, >> > > > > FACT_Contact_Detail.Dur_Queue_Time, >> > > > > FACT_Contact_Detail.Conn_Clear_Date_Time, >> > > > > FACT_Contact_Detail.Wrap_End_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.Answer_Flag, >> > > > > >> > > > > FACT_Contact_Detail.Call_End_Date_Time, >> > > > > FACT_Contact_Detail.Script_Id_Source, >> > > > > >> > > > > FACT_Contact_Detail.IPNIQ_ASBR_Flag, >> > > > > FACT_Contact_Detail.Reserved_Start_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.Reserved_End_Date_Time, >> > > > > FACT_Contact_Detail.Dur_IPNIQ_Q_Time, >> > > > > FACT_Contact_Detail.SeqLeg_Number, >> > > > > >> > > > > FACT_Contact_Detail.Num_Accept_Responses, >> > > > > FACT_Contact_Detail.Num_Reject_Responses, >> > > > > FACT_Contact_Detail.Preview_Start_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.Preview_End_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.Detection_Date_Time, >> > > > > FACT_Contact_Detail.Answer_Date_Time, >> > > > > FACT_Contact_Detail.Record_Number, >> > > > > FACT_Contact_Detail.Detection_Type, >> > > > > FACT_Contact_Detail.Calling_Party_Name, >> > > > > FACT_Contact_Detail.Num_Phone_Rings, >> > > > > >> > > > > FACT_Contact_Detail.End_Greeting_Date_Time, >> > > > > FACT_Contact_Detail.Phone_Start_Ringing_Date_Time, >> > > > > >> > > > > FACT_Contact_Detail.Begin_Greeting_Date_Time, >> > > > > FACT_Contact_Detail.Begin_Msg_Playback_Date_Time, >> > > > > FACT_Contact_Detail.Xfer_Command_Date_Time, >> > > > > FACT_Contact_Detail.Connect_Date_Time, >> > > > > FACT_Contact_Detail.Second_Party_Park_Flag, >> > > > > FACT_Contact_Detail.Dur_Short_Call, >> > > > > FACT_Agent_Activity_Detail.Call_Id_Source, >> > > > > >> > > > > FACT_Contact_Detail.Call_Category_Key, >> > > > > >> > > > > DIM_Agents_AgentEventDetailsSecondParty.User_Id_Source >> > > > > >> > > > > FROM >> > > > > hdfs.rootparq.`fact_agent_activity_detail_12m_partparq` AS >> > > > > FACT_Agent_Activity_Detail >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > >> > > > > (select * from hdfs.rootparq.`fact_contact_detail_12m_partparq` >> > > > > where Call_Start_Date_Time >> > > > > >> > > > > between >> > > > > >> > > > > '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND >> > > > > fact_contact_detail_12m_partparq.dir0 = 2014 and >> > > > > fact_contact_detail_12M_partParq.dir1 = 12 ) AS >> > > > > FACT_Contact_Detail ON >> > > > > FACT_Agent_Activity_Detail.Data_Source_Key = >> > > > > FACT_Contact_Detail.Data_Source_Key >> > > > > AND >> > > > > FACT_Agent_Activity_Detail.Call_Id_Source = >> > > > > FACT_Contact_Detail.Call_Id_Source >> > > > > AND >> > > > > FACT_Agent_Activity_Detail.Sequence_Number = >> > > > > FACT_Contact_Detail.Sequence_Number >> > > > > >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Services_Parq AS >> > > > > DIM_Services_Agent_Activity_Detail >> > > > > ON DIM_Services_Agent_Activity_Detail.Service_Key = >> > > > > FACT_Agent_Activity_Detail.Service_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Services_Parq AS DIM_Services_Contact_Detail ON >> > > > > FACT_Contact_Detail.Service_Key = >> > > > > DIM_Services_Contact_Detail.Service_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Call_Types_Parq AS DIM_CallTypes_ContactDetails ON >> > > > > FACT_Contact_Detail.Call_Type_Key = >> > > > > DIM_CallTypes_ContactDetails.Call_Type_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Resource_Groups_parq AS >> > > > > DIM_Resource_Groups_AgentEventDetails >> > > ON >> > > > > FACT_Contact_Detail.Resource_Group_Key = >> > > > > DIM_Resource_Groups_AgentEventDetails.Resource_Group_Key >> > > > > >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Call_Actions_parq AS DIM_Call_Actions ON >> > > > > DIM_Call_Actions.Call_Action_Key = >> > > > > FACT_Contact_Detail.Call_Action_Key >> > > > > >> > > > > INNER JOIN >> > > > > DIM_Agents_Parquet AS DIM_Agents_AgentEventDetails ON >> > > > > FACT_Agent_Activity_Detail.Agent_Key = >> > > > > DIM_Agents_AgentEventDetails.Agent_Key >> > > > > >> > > > > INNER JOIN >> > > > > DIM_Sites_Parq ON FACT_Agent_Activity_Detail.Site_Key = >> > > > > DIM_Sites_Parq.Site_Key >> > > > > >> > > > > INNER JOIN >> > > > > DIM_Workgroups_parq AS DIM_Workgroups_AgentEventDetails ON >> > > > > FACT_Agent_Activity_Detail.Workgroup_Key = >> > > > > DIM_Workgroups_AgentEventDetails.Workgroup_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Dispositions_parq AS DIM_Dispositions ON >> > > > > FACT_Contact_Detail.Agent_Disp_Key = >> > > > > DIM_Dispositions.Disposion_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Agents_Parquet AS DIM_Agents_AgentEventDetailsSecondParty >> ON >> > > > > FACT_Contact_Detail.Second_Party_Agent_Key = >> > > > > DIM_Agents_AgentEventDetailsSecondParty.Agent_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Call_Action_Reasons_parq AS DIM_Call_Action_Reasons ON >> > > > > FACT_Contact_Detail.Call_Action_Reason_Key = >> > > > > DIM_Call_Action_Reasons.Call_Action_Reason_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Agent_Status_parq AS DIM_Agent_Status_AgentEventDetails ON >> > > > > FACT_Agent_Activity_Detail.Status_Key = >> > > > > DIM_Agent_Status_AgentEventDetails.Agent_Status_Key >> > > > > >> > > > > LEFT OUTER JOIN >> > > > > DIM_Agent_Status_Reasons_parq AS >> > > > > DIM_Agent_Reason_AgentEventDetails >> > > > ON >> > > > > FACT_Agent_Activity_Detail.Reason_Key = >> > > > > DIM_Agent_Reason_AgentEventDetails.Agent_Status_Reason_Key >> > > > > >> > > > > >> > > > > WHERE >> > > > > ( >> > > > > >> > > > > DIM_Workgroups_AgentEventDetails.Workgroup_Key IN ( 3,1) >> > > > > >> > > > > AND >> > > > > >> > > > > >> > > > > ( FACT_Agent_Activity_Detail.Start_Date_Time >> > > > > >> > > > > between >> > > > > '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND >> > > > > FACT_Agent_Activity_Detail.dir0 = 2014 and >> > > > > FACT_Agent_Activity_Detail.dir1 = >> > > > > >> > > > > 12 >> > > > > ) >> > > > > >> > > > > ); >> > > > > >> > > > > >> > > > > Thanks, >> > > > > Latha >> > > > > >> > > > > >> > > > > This email (including any attachments) is proprietary to Aspect >> > > > > Software, Inc. and may contain information that is confidential. >> > > > > If you have received this message in error, please do not read, >> > > > > copy or >> > > > forward this message. >> > > > > Please notify the sender immediately, delete it from your system >> > > > > and destroy any copies. You may not further disclose or distribute >> > > > > this email or its attachments. >> > > > > >> > > > This email (including any attachments) is proprietary to Aspect >> > > > Software, Inc. and may contain information that is confidential. If >> > > > you have received this message in error, please do not read, copy or >> > > forward this message. >> > > > Please notify the sender immediately, delete it from your system and >> > > > destroy any copies. You may not further disclose or distribute this >> > > > email or its attachments. >> > > > >> > > This email (including any attachments) is proprietary to Aspect >> > > Software, Inc. and may contain information that is confidential. If >> > > you have received this message in error, please do not read, copy or >> > forward this message. >> > > Please notify the sender immediately, delete it from your system and >> > > destroy any copies. You may not further disclose or distribute this >> > > email or its attachments. >> > > >> > >> > >
