Hi,

I am running performance tests for our use cases on Impala and Drill to make a 
choice and pick one for our Data warehouse implementation on top of Hadoop.  I 
used the same cluster for both the tests. Select Query that takes ~20 seconds 
in impala, takes around 440 seconds in drill on same hardware cluster (nearly 
20 times more). I would like to make sure that I am not missing anything in 
tuning or configuration.  Below is our configuration, appreciate comments on if 
these performance numbers are as expected or if any tuning can be done to 
improve drill performance.

Hadoop Cluster configuration and data set:


*         5 nodes cluster, 1 name node and 4 data nodes, 32GB memory , 4 cpus 
on each node.

*         There are two main fact tables. 98 million records in one and 87 
million records in another. There are around 10 dimension tables. For now, we 
are running the tests with normalized schema from out data warehouse.

*         Running the same query against Impala and Drill except for the syntax 
differences everything is same.

Impala Configuration


*         CDH 5.3.2 with Impala 2.1.2 version

*         4 impala daemons running on 4 data nodes.

*          Impala daemons memory set to 25GB.

*         Partitioned parquet tables were created from text format hdfs files 
using impala itself (CTAS command).

*         Parquet block size 256MB

Drill Configuration


*         Drill 0.8 version, added drill on the CDH cluster setup

*         4 drillbits running on 4 data nodes

*         Drillbits memory set to 16GB and heap memory set to 8GB.  I ran with 
8GB memory and 4GB for heap, there was no change in the query timings.

*         Partitioned parquet files.

SQL Query

SELECT

                DIM_Sites_parq.Site_Desc,

                DIM_Workgroups_AgentEventDetails.Workgroup_Desc,
                DIM_Agents_AgentEventDetails.Last_Name,

                DIM_Agents_AgentEventDetails.First_Name,


                DIM_Agent_Status_AgentEventDetails.Agent_Status_Key,
                DIM_CallTypes_ContactDetails.Call_Type_Desc,



                DIM_Call_Actions.Call_Action_Desc,

                FACT_Contact_Detail.ANI,
                FACT_Contact_Detail.DNIS,

                FACT_Contact_Detail.Begin_Date_Time,
                FACT_Contact_Detail.Call_Start_Date_Time,

                FACT_Contact_Detail.Orig_DNIS,
                FACT_Agent_Activity_Detail.Start_Date_Time,
                FACT_Agent_Activity_Detail.End_Date_Time,
                FACT_Agent_Activity_Detail.Dur_Time,

                FACT_Agent_Activity_Detail.Sequence_Number,

                FACT_Contact_Detail.Begin_Date_Time,
                FACT_Contact_Detail.Call_Start_Date_Time,
                FACT_Agent_Activity_Detail.Start_Date_Time,
                FACT_Agent_Activity_Detail.End_Date_Time,


                DIM_Dispositions.Disposition_Desc,
                DIM_Agents_AgentEventDetails.Full_Name ,

                DIM_Resource_Groups_AgentEventDetails.Resource_Group_Desc,

                FACT_Contact_Detail.Sequence_Number,

                FACT_Contact_Detail.Service_Key,
                FACT_Contact_Detail.Station,
                FACT_Contact_Detail.Near_Off_Hook_Date_Time,
                FACT_Contact_Detail.Far_Off_Hook_Date_Time,
                FACT_Contact_Detail.Near_On_Hook_Date_Time,
                FACT_Contact_Detail.Far_On_Hook_Date_Time,
                FACT_Contact_Detail.Begin_Greeting_Date_Time,

                FACT_Contact_Detail.End_Guard_Date_Time,
                FACT_Contact_Detail.Resource_Group_Type,
                FACT_Contact_Detail.Info_Digits,
                FACT_Contact_Detail.Agent_Site_Key,
                FACT_Contact_Detail.Agent_Switch_Key,

                FACT_Contact_Detail.Line_Number,
                FACT_Contact_Detail.Queue_Start_Date_Time,

                FACT_Contact_Detail.CallQ_Start_Date_Time,
                FACT_Contact_Detail.Queue_End_Date_Time,
                FACT_Contact_Detail.Dur_Queue_Time,
                FACT_Contact_Detail.Conn_Clear_Date_Time,
                FACT_Contact_Detail.Wrap_End_Date_Time,

FACT_Contact_Detail.Answer_Flag,

FACT_Contact_Detail.Call_End_Date_Time,
 FACT_Contact_Detail.Script_Id_Source,

  FACT_Contact_Detail.IPNIQ_ASBR_Flag,
 FACT_Contact_Detail.Reserved_Start_Date_Time,

  FACT_Contact_Detail.Reserved_End_Date_Time,
 FACT_Contact_Detail.Dur_IPNIQ_Q_Time,
 FACT_Contact_Detail.SeqLeg_Number,

FACT_Contact_Detail.Num_Accept_Responses,
 FACT_Contact_Detail.Num_Reject_Responses,
 FACT_Contact_Detail.Preview_Start_Date_Time,

  FACT_Contact_Detail.Preview_End_Date_Time,

FACT_Contact_Detail.Detection_Date_Time,
 FACT_Contact_Detail.Answer_Date_Time,
 FACT_Contact_Detail.Record_Number,
 FACT_Contact_Detail.Detection_Type,
 FACT_Contact_Detail.Calling_Party_Name,
 FACT_Contact_Detail.Num_Phone_Rings,

  FACT_Contact_Detail.End_Greeting_Date_Time,
 FACT_Contact_Detail.Phone_Start_Ringing_Date_Time,

  FACT_Contact_Detail.Begin_Greeting_Date_Time,
 FACT_Contact_Detail.Begin_Msg_Playback_Date_Time,
 FACT_Contact_Detail.Xfer_Command_Date_Time,
 FACT_Contact_Detail.Connect_Date_Time,
 FACT_Contact_Detail.Second_Party_Park_Flag,
 FACT_Contact_Detail.Dur_Short_Call,
 FACT_Agent_Activity_Detail.Call_Id_Source,

  FACT_Contact_Detail.Call_Category_Key,

  DIM_Agents_AgentEventDetailsSecondParty.User_Id_Source

  FROM
    hdfs.rootparq.`fact_agent_activity_detail_12m_partparq` AS
FACT_Agent_Activity_Detail

                LEFT OUTER JOIN

 (select * from hdfs.rootparq.`fact_contact_detail_12m_partparq` where
Call_Start_Date_Time

between

'2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND
fact_contact_detail_12m_partparq.dir0 = 2014 and
fact_contact_detail_12M_partParq.dir1 = 12 ) AS FACT_Contact_Detail
 ON
       FACT_Agent_Activity_Detail.Data_Source_Key = 
FACT_Contact_Detail.Data_Source_Key
                AND
       FACT_Agent_Activity_Detail.Call_Id_Source = 
FACT_Contact_Detail.Call_Id_Source
                AND
       FACT_Agent_Activity_Detail.Sequence_Number = 
FACT_Contact_Detail.Sequence_Number


                LEFT OUTER JOIN
                   DIM_Services_Parq AS DIM_Services_Agent_Activity_Detail ON 
DIM_Services_Agent_Activity_Detail.Service_Key = 
FACT_Agent_Activity_Detail.Service_Key

                LEFT OUTER JOIN
    DIM_Services_Parq AS DIM_Services_Contact_Detail ON 
FACT_Contact_Detail.Service_Key = DIM_Services_Contact_Detail.Service_Key

                LEFT OUTER JOIN
    DIM_Call_Types_Parq AS DIM_CallTypes_ContactDetails ON
       FACT_Contact_Detail.Call_Type_Key = 
DIM_CallTypes_ContactDetails.Call_Type_Key

                LEFT OUTER JOIN
    DIM_Resource_Groups_parq AS DIM_Resource_Groups_AgentEventDetails ON
       FACT_Contact_Detail.Resource_Group_Key = 
DIM_Resource_Groups_AgentEventDetails.Resource_Group_Key


                LEFT OUTER JOIN
    DIM_Call_Actions_parq AS DIM_Call_Actions ON 
DIM_Call_Actions.Call_Action_Key = FACT_Contact_Detail.Call_Action_Key

                INNER JOIN
        DIM_Agents_Parquet AS DIM_Agents_AgentEventDetails ON 
FACT_Agent_Activity_Detail.Agent_Key = DIM_Agents_AgentEventDetails.Agent_Key

                INNER JOIN
        DIM_Sites_Parq ON FACT_Agent_Activity_Detail.Site_Key = 
DIM_Sites_Parq.Site_Key

                INNER JOIN
        DIM_Workgroups_parq AS DIM_Workgroups_AgentEventDetails ON
       FACT_Agent_Activity_Detail.Workgroup_Key = 
DIM_Workgroups_AgentEventDetails.Workgroup_Key

                LEFT OUTER JOIN
    DIM_Dispositions_parq AS DIM_Dispositions ON 
FACT_Contact_Detail.Agent_Disp_Key = DIM_Dispositions.Disposion_Key

                LEFT OUTER JOIN
    DIM_Agents_Parquet AS DIM_Agents_AgentEventDetailsSecondParty ON
       FACT_Contact_Detail.Second_Party_Agent_Key = 
DIM_Agents_AgentEventDetailsSecondParty.Agent_Key

                LEFT OUTER JOIN
    DIM_Call_Action_Reasons_parq AS DIM_Call_Action_Reasons ON 
FACT_Contact_Detail.Call_Action_Reason_Key = 
DIM_Call_Action_Reasons.Call_Action_Reason_Key

                LEFT OUTER JOIN
    DIM_Agent_Status_parq AS DIM_Agent_Status_AgentEventDetails ON
       FACT_Agent_Activity_Detail.Status_Key = 
DIM_Agent_Status_AgentEventDetails.Agent_Status_Key

                LEFT OUTER JOIN
    DIM_Agent_Status_Reasons_parq AS DIM_Agent_Reason_AgentEventDetails ON
       FACT_Agent_Activity_Detail.Reason_Key = 
DIM_Agent_Reason_AgentEventDetails.Agent_Status_Reason_Key


WHERE
 (

DIM_Workgroups_AgentEventDetails.Workgroup_Key  IN  ( 3,1)

  AND


   ( FACT_Agent_Activity_Detail.Start_Date_Time

between
'2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND 
FACT_Agent_Activity_Detail.dir0 = 2014 and FACT_Agent_Activity_Detail.dir1 =

12
   )

  );


Thanks,
Latha


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

Reply via email to