Latha What error are you getting when doing explain plan. On Mon, Apr 20, 2015 at 4:40 PM, Sivasubramaniam, Latha < [email protected]> wrote:
> I am unable to do an explain plan for this query. Is there a way to pass > a sql file for the Explain plan command. Query runs fine from the file with > the !run command though. Will try to fix the errors and see if I can get > the query plan. > > Thanks, > latha > > -----Original Message----- > From: Jacques Nadeau [mailto:[email protected]] > Sent: Monday, April 20, 2015 4:12 PM > To: [email protected] > Subject: Re: Query performance Comparison - Drill and Impala > > Can you confirm you are getting broadcast joins for all the dimension > tables? Additionally, can you confirm the planning time? I'm guessing both > might be an issue given the times you are seeing. > On Apr 20, 2015 6:46 PM, "Sivasubramaniam, Latha" < > [email protected]> wrote: > > > Hi, > > > > I am running performance tests for our use cases on Impala and Drill > > to make a choice and pick one for our Data warehouse implementation on > > top of Hadoop. I used the same cluster for both the tests. Select > > Query that takes ~20 seconds in impala, takes around 440 seconds in > > drill on same hardware cluster (nearly 20 times more). I would like to > > make sure that I am not missing anything in tuning or configuration. > > Below is our configuration, appreciate comments on if these > > performance numbers are as expected or if any tuning can be done to > improve drill performance. > > > > Hadoop Cluster configuration and data set: > > > > > > * 5 nodes cluster, 1 name node and 4 data nodes, 32GB memory , 4 > > cpus on each node. > > > > * There are two main fact tables. 98 million records in one and > 87 > > million records in another. There are around 10 dimension tables. For > > now, we are running the tests with normalized schema from out data > warehouse. > > > > * Running the same query against Impala and Drill except for the > > syntax differences everything is same. > > > > Impala Configuration > > > > > > * CDH 5.3.2 with Impala 2.1.2 version > > > > * 4 impala daemons running on 4 data nodes. > > > > * Impala daemons memory set to 25GB. > > > > * Partitioned parquet tables were created from text format hdfs > > files using impala itself (CTAS command). > > > > * Parquet block size 256MB > > > > Drill Configuration > > > > > > * Drill 0.8 version, added drill on the CDH cluster setup > > > > * 4 drillbits running on 4 data nodes > > > > * Drillbits memory set to 16GB and heap memory set to 8GB. I ran > > with 8GB memory and 4GB for heap, there was no change in the query > timings. > > > > * Partitioned parquet files. > > > > SQL Query > > > > SELECT > > > > DIM_Sites_parq.Site_Desc, > > > > DIM_Workgroups_AgentEventDetails.Workgroup_Desc, > > DIM_Agents_AgentEventDetails.Last_Name, > > > > DIM_Agents_AgentEventDetails.First_Name, > > > > > > DIM_Agent_Status_AgentEventDetails.Agent_Status_Key, > > DIM_CallTypes_ContactDetails.Call_Type_Desc, > > > > > > > > DIM_Call_Actions.Call_Action_Desc, > > > > FACT_Contact_Detail.ANI, > > FACT_Contact_Detail.DNIS, > > > > FACT_Contact_Detail.Begin_Date_Time, > > FACT_Contact_Detail.Call_Start_Date_Time, > > > > FACT_Contact_Detail.Orig_DNIS, > > FACT_Agent_Activity_Detail.Start_Date_Time, > > FACT_Agent_Activity_Detail.End_Date_Time, > > FACT_Agent_Activity_Detail.Dur_Time, > > > > FACT_Agent_Activity_Detail.Sequence_Number, > > > > FACT_Contact_Detail.Begin_Date_Time, > > FACT_Contact_Detail.Call_Start_Date_Time, > > FACT_Agent_Activity_Detail.Start_Date_Time, > > FACT_Agent_Activity_Detail.End_Date_Time, > > > > > > DIM_Dispositions.Disposition_Desc, > > DIM_Agents_AgentEventDetails.Full_Name , > > > > > > DIM_Resource_Groups_AgentEventDetails.Resource_Group_Desc, > > > > FACT_Contact_Detail.Sequence_Number, > > > > FACT_Contact_Detail.Service_Key, > > FACT_Contact_Detail.Station, > > FACT_Contact_Detail.Near_Off_Hook_Date_Time, > > FACT_Contact_Detail.Far_Off_Hook_Date_Time, > > FACT_Contact_Detail.Near_On_Hook_Date_Time, > > FACT_Contact_Detail.Far_On_Hook_Date_Time, > > FACT_Contact_Detail.Begin_Greeting_Date_Time, > > > > FACT_Contact_Detail.End_Guard_Date_Time, > > FACT_Contact_Detail.Resource_Group_Type, > > FACT_Contact_Detail.Info_Digits, > > FACT_Contact_Detail.Agent_Site_Key, > > FACT_Contact_Detail.Agent_Switch_Key, > > > > FACT_Contact_Detail.Line_Number, > > FACT_Contact_Detail.Queue_Start_Date_Time, > > > > FACT_Contact_Detail.CallQ_Start_Date_Time, > > FACT_Contact_Detail.Queue_End_Date_Time, > > FACT_Contact_Detail.Dur_Queue_Time, > > FACT_Contact_Detail.Conn_Clear_Date_Time, > > FACT_Contact_Detail.Wrap_End_Date_Time, > > > > FACT_Contact_Detail.Answer_Flag, > > > > FACT_Contact_Detail.Call_End_Date_Time, > > FACT_Contact_Detail.Script_Id_Source, > > > > FACT_Contact_Detail.IPNIQ_ASBR_Flag, > > FACT_Contact_Detail.Reserved_Start_Date_Time, > > > > FACT_Contact_Detail.Reserved_End_Date_Time, > > FACT_Contact_Detail.Dur_IPNIQ_Q_Time, > > FACT_Contact_Detail.SeqLeg_Number, > > > > FACT_Contact_Detail.Num_Accept_Responses, > > FACT_Contact_Detail.Num_Reject_Responses, > > FACT_Contact_Detail.Preview_Start_Date_Time, > > > > FACT_Contact_Detail.Preview_End_Date_Time, > > > > FACT_Contact_Detail.Detection_Date_Time, > > FACT_Contact_Detail.Answer_Date_Time, > > FACT_Contact_Detail.Record_Number, > > FACT_Contact_Detail.Detection_Type, > > FACT_Contact_Detail.Calling_Party_Name, > > FACT_Contact_Detail.Num_Phone_Rings, > > > > FACT_Contact_Detail.End_Greeting_Date_Time, > > FACT_Contact_Detail.Phone_Start_Ringing_Date_Time, > > > > FACT_Contact_Detail.Begin_Greeting_Date_Time, > > FACT_Contact_Detail.Begin_Msg_Playback_Date_Time, > > FACT_Contact_Detail.Xfer_Command_Date_Time, > > FACT_Contact_Detail.Connect_Date_Time, > > FACT_Contact_Detail.Second_Party_Park_Flag, > > FACT_Contact_Detail.Dur_Short_Call, > > FACT_Agent_Activity_Detail.Call_Id_Source, > > > > FACT_Contact_Detail.Call_Category_Key, > > > > DIM_Agents_AgentEventDetailsSecondParty.User_Id_Source > > > > FROM > > hdfs.rootparq.`fact_agent_activity_detail_12m_partparq` AS > > FACT_Agent_Activity_Detail > > > > LEFT OUTER JOIN > > > > (select * from hdfs.rootparq.`fact_contact_detail_12m_partparq` where > > Call_Start_Date_Time > > > > between > > > > '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND > > fact_contact_detail_12m_partparq.dir0 = 2014 and > > fact_contact_detail_12M_partParq.dir1 = 12 ) AS FACT_Contact_Detail > > ON > > FACT_Agent_Activity_Detail.Data_Source_Key = > > FACT_Contact_Detail.Data_Source_Key > > AND > > FACT_Agent_Activity_Detail.Call_Id_Source = > > FACT_Contact_Detail.Call_Id_Source > > AND > > FACT_Agent_Activity_Detail.Sequence_Number = > > FACT_Contact_Detail.Sequence_Number > > > > > > LEFT OUTER JOIN > > DIM_Services_Parq AS > > DIM_Services_Agent_Activity_Detail > > ON DIM_Services_Agent_Activity_Detail.Service_Key = > > FACT_Agent_Activity_Detail.Service_Key > > > > LEFT OUTER JOIN > > DIM_Services_Parq AS DIM_Services_Contact_Detail ON > > FACT_Contact_Detail.Service_Key = > > DIM_Services_Contact_Detail.Service_Key > > > > LEFT OUTER JOIN > > DIM_Call_Types_Parq AS DIM_CallTypes_ContactDetails ON > > FACT_Contact_Detail.Call_Type_Key = > > DIM_CallTypes_ContactDetails.Call_Type_Key > > > > LEFT OUTER JOIN > > DIM_Resource_Groups_parq AS DIM_Resource_Groups_AgentEventDetails ON > > FACT_Contact_Detail.Resource_Group_Key = > > DIM_Resource_Groups_AgentEventDetails.Resource_Group_Key > > > > > > LEFT OUTER JOIN > > DIM_Call_Actions_parq AS DIM_Call_Actions ON > > DIM_Call_Actions.Call_Action_Key = FACT_Contact_Detail.Call_Action_Key > > > > INNER JOIN > > DIM_Agents_Parquet AS DIM_Agents_AgentEventDetails ON > > FACT_Agent_Activity_Detail.Agent_Key = > > DIM_Agents_AgentEventDetails.Agent_Key > > > > INNER JOIN > > DIM_Sites_Parq ON FACT_Agent_Activity_Detail.Site_Key = > > DIM_Sites_Parq.Site_Key > > > > INNER JOIN > > DIM_Workgroups_parq AS DIM_Workgroups_AgentEventDetails ON > > FACT_Agent_Activity_Detail.Workgroup_Key = > > DIM_Workgroups_AgentEventDetails.Workgroup_Key > > > > LEFT OUTER JOIN > > DIM_Dispositions_parq AS DIM_Dispositions ON > > FACT_Contact_Detail.Agent_Disp_Key = DIM_Dispositions.Disposion_Key > > > > LEFT OUTER JOIN > > DIM_Agents_Parquet AS DIM_Agents_AgentEventDetailsSecondParty ON > > FACT_Contact_Detail.Second_Party_Agent_Key = > > DIM_Agents_AgentEventDetailsSecondParty.Agent_Key > > > > LEFT OUTER JOIN > > DIM_Call_Action_Reasons_parq AS DIM_Call_Action_Reasons ON > > FACT_Contact_Detail.Call_Action_Reason_Key = > > DIM_Call_Action_Reasons.Call_Action_Reason_Key > > > > LEFT OUTER JOIN > > DIM_Agent_Status_parq AS DIM_Agent_Status_AgentEventDetails ON > > FACT_Agent_Activity_Detail.Status_Key = > > DIM_Agent_Status_AgentEventDetails.Agent_Status_Key > > > > LEFT OUTER JOIN > > DIM_Agent_Status_Reasons_parq AS DIM_Agent_Reason_AgentEventDetails > ON > > FACT_Agent_Activity_Detail.Reason_Key = > > DIM_Agent_Reason_AgentEventDetails.Agent_Status_Reason_Key > > > > > > WHERE > > ( > > > > DIM_Workgroups_AgentEventDetails.Workgroup_Key IN ( 3,1) > > > > AND > > > > > > ( FACT_Agent_Activity_Detail.Start_Date_Time > > > > between > > '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND > > FACT_Agent_Activity_Detail.dir0 = 2014 and > > FACT_Agent_Activity_Detail.dir1 = > > > > 12 > > ) > > > > ); > > > > > > Thanks, > > Latha > > > > > > This email (including any attachments) is proprietary to Aspect > > Software, Inc. and may contain information that is confidential. If > > you have received this message in error, please do not read, copy or > forward this message. > > Please notify the sender immediately, delete it from your system and > > destroy any copies. You may not further disclose or distribute this > > email or its attachments. > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
