Hi,
I am running performance tests for our use cases on Impala and Drill to make a
choice and pick one for our Data warehouse implementation on top of Hadoop. I
used the same cluster for both the tests. Select Query that takes ~20 seconds
in impala, takes around 440 seconds in drill on same hardware cluster (nearly
20 times more). I would like to make sure that I am not missing anything in
tuning or configuration. Below is our configuration, appreciate comments on if
these performance numbers are as expected or if any tuning can be done to
improve drill performance.
Hadoop Cluster configuration and data set:
* 5 nodes cluster, 1 name node and 4 data nodes, 32GB memory , 4 cpus
on each node.
* There are two main fact tables. 98 million records in one and 87
million records in another. There are around 10 dimension tables. For now, we
are running the tests with normalized schema from out data warehouse.
* Running the same query against Impala and Drill except for the syntax
differences everything is same.
Impala Configuration
* CDH 5.3.2 with Impala 2.1.2 version
* 4 impala daemons running on 4 data nodes.
* Impala daemons memory set to 25GB.
* Partitioned parquet tables were created from text format hdfs files
using impala itself (CTAS command).
* Parquet block size 256MB
Drill Configuration
* Drill 0.8 version, added drill on the CDH cluster setup
* 4 drillbits running on 4 data nodes
* Drillbits memory set to 16GB and heap memory set to 8GB. I ran with
8GB memory and 4GB for heap, there was no change in the query timings.
* Partitioned parquet files.
SQL Query
SELECT
DIM_Sites_parq.Site_Desc,
DIM_Workgroups_AgentEventDetails.Workgroup_Desc,
DIM_Agents_AgentEventDetails.Last_Name,
DIM_Agents_AgentEventDetails.First_Name,
DIM_Agent_Status_AgentEventDetails.Agent_Status_Key,
DIM_CallTypes_ContactDetails.Call_Type_Desc,
DIM_Call_Actions.Call_Action_Desc,
FACT_Contact_Detail.ANI,
FACT_Contact_Detail.DNIS,
FACT_Contact_Detail.Begin_Date_Time,
FACT_Contact_Detail.Call_Start_Date_Time,
FACT_Contact_Detail.Orig_DNIS,
FACT_Agent_Activity_Detail.Start_Date_Time,
FACT_Agent_Activity_Detail.End_Date_Time,
FACT_Agent_Activity_Detail.Dur_Time,
FACT_Agent_Activity_Detail.Sequence_Number,
FACT_Contact_Detail.Begin_Date_Time,
FACT_Contact_Detail.Call_Start_Date_Time,
FACT_Agent_Activity_Detail.Start_Date_Time,
FACT_Agent_Activity_Detail.End_Date_Time,
DIM_Dispositions.Disposition_Desc,
DIM_Agents_AgentEventDetails.Full_Name ,
DIM_Resource_Groups_AgentEventDetails.Resource_Group_Desc,
FACT_Contact_Detail.Sequence_Number,
FACT_Contact_Detail.Service_Key,
FACT_Contact_Detail.Station,
FACT_Contact_Detail.Near_Off_Hook_Date_Time,
FACT_Contact_Detail.Far_Off_Hook_Date_Time,
FACT_Contact_Detail.Near_On_Hook_Date_Time,
FACT_Contact_Detail.Far_On_Hook_Date_Time,
FACT_Contact_Detail.Begin_Greeting_Date_Time,
FACT_Contact_Detail.End_Guard_Date_Time,
FACT_Contact_Detail.Resource_Group_Type,
FACT_Contact_Detail.Info_Digits,
FACT_Contact_Detail.Agent_Site_Key,
FACT_Contact_Detail.Agent_Switch_Key,
FACT_Contact_Detail.Line_Number,
FACT_Contact_Detail.Queue_Start_Date_Time,
FACT_Contact_Detail.CallQ_Start_Date_Time,
FACT_Contact_Detail.Queue_End_Date_Time,
FACT_Contact_Detail.Dur_Queue_Time,
FACT_Contact_Detail.Conn_Clear_Date_Time,
FACT_Contact_Detail.Wrap_End_Date_Time,
FACT_Contact_Detail.Answer_Flag,
FACT_Contact_Detail.Call_End_Date_Time,
FACT_Contact_Detail.Script_Id_Source,
FACT_Contact_Detail.IPNIQ_ASBR_Flag,
FACT_Contact_Detail.Reserved_Start_Date_Time,
FACT_Contact_Detail.Reserved_End_Date_Time,
FACT_Contact_Detail.Dur_IPNIQ_Q_Time,
FACT_Contact_Detail.SeqLeg_Number,
FACT_Contact_Detail.Num_Accept_Responses,
FACT_Contact_Detail.Num_Reject_Responses,
FACT_Contact_Detail.Preview_Start_Date_Time,
FACT_Contact_Detail.Preview_End_Date_Time,
FACT_Contact_Detail.Detection_Date_Time,
FACT_Contact_Detail.Answer_Date_Time,
FACT_Contact_Detail.Record_Number,
FACT_Contact_Detail.Detection_Type,
FACT_Contact_Detail.Calling_Party_Name,
FACT_Contact_Detail.Num_Phone_Rings,
FACT_Contact_Detail.End_Greeting_Date_Time,
FACT_Contact_Detail.Phone_Start_Ringing_Date_Time,
FACT_Contact_Detail.Begin_Greeting_Date_Time,
FACT_Contact_Detail.Begin_Msg_Playback_Date_Time,
FACT_Contact_Detail.Xfer_Command_Date_Time,
FACT_Contact_Detail.Connect_Date_Time,
FACT_Contact_Detail.Second_Party_Park_Flag,
FACT_Contact_Detail.Dur_Short_Call,
FACT_Agent_Activity_Detail.Call_Id_Source,
FACT_Contact_Detail.Call_Category_Key,
DIM_Agents_AgentEventDetailsSecondParty.User_Id_Source
FROM
hdfs.rootparq.`fact_agent_activity_detail_12m_partparq` AS
FACT_Agent_Activity_Detail
LEFT OUTER JOIN
(select * from hdfs.rootparq.`fact_contact_detail_12m_partparq` where
Call_Start_Date_Time
between
'2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND
fact_contact_detail_12m_partparq.dir0 = 2014 and
fact_contact_detail_12M_partParq.dir1 = 12 ) AS FACT_Contact_Detail
ON
FACT_Agent_Activity_Detail.Data_Source_Key =
FACT_Contact_Detail.Data_Source_Key
AND
FACT_Agent_Activity_Detail.Call_Id_Source =
FACT_Contact_Detail.Call_Id_Source
AND
FACT_Agent_Activity_Detail.Sequence_Number =
FACT_Contact_Detail.Sequence_Number
LEFT OUTER JOIN
DIM_Services_Parq AS DIM_Services_Agent_Activity_Detail ON
DIM_Services_Agent_Activity_Detail.Service_Key =
FACT_Agent_Activity_Detail.Service_Key
LEFT OUTER JOIN
DIM_Services_Parq AS DIM_Services_Contact_Detail ON
FACT_Contact_Detail.Service_Key = DIM_Services_Contact_Detail.Service_Key
LEFT OUTER JOIN
DIM_Call_Types_Parq AS DIM_CallTypes_ContactDetails ON
FACT_Contact_Detail.Call_Type_Key =
DIM_CallTypes_ContactDetails.Call_Type_Key
LEFT OUTER JOIN
DIM_Resource_Groups_parq AS DIM_Resource_Groups_AgentEventDetails ON
FACT_Contact_Detail.Resource_Group_Key =
DIM_Resource_Groups_AgentEventDetails.Resource_Group_Key
LEFT OUTER JOIN
DIM_Call_Actions_parq AS DIM_Call_Actions ON
DIM_Call_Actions.Call_Action_Key = FACT_Contact_Detail.Call_Action_Key
INNER JOIN
DIM_Agents_Parquet AS DIM_Agents_AgentEventDetails ON
FACT_Agent_Activity_Detail.Agent_Key = DIM_Agents_AgentEventDetails.Agent_Key
INNER JOIN
DIM_Sites_Parq ON FACT_Agent_Activity_Detail.Site_Key =
DIM_Sites_Parq.Site_Key
INNER JOIN
DIM_Workgroups_parq AS DIM_Workgroups_AgentEventDetails ON
FACT_Agent_Activity_Detail.Workgroup_Key =
DIM_Workgroups_AgentEventDetails.Workgroup_Key
LEFT OUTER JOIN
DIM_Dispositions_parq AS DIM_Dispositions ON
FACT_Contact_Detail.Agent_Disp_Key = DIM_Dispositions.Disposion_Key
LEFT OUTER JOIN
DIM_Agents_Parquet AS DIM_Agents_AgentEventDetailsSecondParty ON
FACT_Contact_Detail.Second_Party_Agent_Key =
DIM_Agents_AgentEventDetailsSecondParty.Agent_Key
LEFT OUTER JOIN
DIM_Call_Action_Reasons_parq AS DIM_Call_Action_Reasons ON
FACT_Contact_Detail.Call_Action_Reason_Key =
DIM_Call_Action_Reasons.Call_Action_Reason_Key
LEFT OUTER JOIN
DIM_Agent_Status_parq AS DIM_Agent_Status_AgentEventDetails ON
FACT_Agent_Activity_Detail.Status_Key =
DIM_Agent_Status_AgentEventDetails.Agent_Status_Key
LEFT OUTER JOIN
DIM_Agent_Status_Reasons_parq AS DIM_Agent_Reason_AgentEventDetails ON
FACT_Agent_Activity_Detail.Reason_Key =
DIM_Agent_Reason_AgentEventDetails.Agent_Status_Reason_Key
WHERE
(
DIM_Workgroups_AgentEventDetails.Workgroup_Key IN ( 3,1)
AND
( FACT_Agent_Activity_Detail.Start_Date_Time
between
'2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND
FACT_Agent_Activity_Detail.dir0 = 2014 and FACT_Agent_Activity_Detail.dir1 =
12
)
);
Thanks,
Latha
This email (including any attachments) is proprietary to Aspect Software, Inc.
and may contain information that is confidential. If you have received this
message in error, please do not read, copy or forward this message. Please
notify the sender immediately, delete it from your system and destroy any
copies. You may not further disclose or distribute this email or its
attachments.