Can you confirm you are getting broadcast joins for all the dimension
tables? Additionally, can you confirm the planning time? I'm guessing both
might be an issue given the times you are seeing.
On Apr 20, 2015 6:46 PM, "Sivasubramaniam, Latha" <
[email protected]> wrote:

> Hi,
>
> I am running performance tests for our use cases on Impala and Drill to
> make a choice and pick one for our Data warehouse implementation on top of
> Hadoop.  I used the same cluster for both the tests. Select Query that
> takes ~20 seconds in impala, takes around 440 seconds in drill on same
> hardware cluster (nearly 20 times more). I would like to make sure that I
> am not missing anything in tuning or configuration.  Below is our
> configuration, appreciate comments on if these performance numbers are as
> expected or if any tuning can be done to improve drill performance.
>
> Hadoop Cluster configuration and data set:
>
>
> *         5 nodes cluster, 1 name node and 4 data nodes, 32GB memory , 4
> cpus on each node.
>
> *         There are two main fact tables. 98 million records in one and 87
> million records in another. There are around 10 dimension tables. For now,
> we are running the tests with normalized schema from out data warehouse.
>
> *         Running the same query against Impala and Drill except for the
> syntax differences everything is same.
>
> Impala Configuration
>
>
> *         CDH 5.3.2 with Impala 2.1.2 version
>
> *         4 impala daemons running on 4 data nodes.
>
> *          Impala daemons memory set to 25GB.
>
> *         Partitioned parquet tables were created from text format hdfs
> files using impala itself (CTAS command).
>
> *         Parquet block size 256MB
>
> Drill Configuration
>
>
> *         Drill 0.8 version, added drill on the CDH cluster setup
>
> *         4 drillbits running on 4 data nodes
>
> *         Drillbits memory set to 16GB and heap memory set to 8GB.  I ran
> with 8GB memory and 4GB for heap, there was no change in the query timings.
>
> *         Partitioned parquet files.
>
> SQL Query
>
> SELECT
>
>                 DIM_Sites_parq.Site_Desc,
>
>                 DIM_Workgroups_AgentEventDetails.Workgroup_Desc,
>                 DIM_Agents_AgentEventDetails.Last_Name,
>
>                 DIM_Agents_AgentEventDetails.First_Name,
>
>
>                 DIM_Agent_Status_AgentEventDetails.Agent_Status_Key,
>                 DIM_CallTypes_ContactDetails.Call_Type_Desc,
>
>
>
>                 DIM_Call_Actions.Call_Action_Desc,
>
>                 FACT_Contact_Detail.ANI,
>                 FACT_Contact_Detail.DNIS,
>
>                 FACT_Contact_Detail.Begin_Date_Time,
>                 FACT_Contact_Detail.Call_Start_Date_Time,
>
>                 FACT_Contact_Detail.Orig_DNIS,
>                 FACT_Agent_Activity_Detail.Start_Date_Time,
>                 FACT_Agent_Activity_Detail.End_Date_Time,
>                 FACT_Agent_Activity_Detail.Dur_Time,
>
>                 FACT_Agent_Activity_Detail.Sequence_Number,
>
>                 FACT_Contact_Detail.Begin_Date_Time,
>                 FACT_Contact_Detail.Call_Start_Date_Time,
>                 FACT_Agent_Activity_Detail.Start_Date_Time,
>                 FACT_Agent_Activity_Detail.End_Date_Time,
>
>
>                 DIM_Dispositions.Disposition_Desc,
>                 DIM_Agents_AgentEventDetails.Full_Name ,
>
>                 DIM_Resource_Groups_AgentEventDetails.Resource_Group_Desc,
>
>                 FACT_Contact_Detail.Sequence_Number,
>
>                 FACT_Contact_Detail.Service_Key,
>                 FACT_Contact_Detail.Station,
>                 FACT_Contact_Detail.Near_Off_Hook_Date_Time,
>                 FACT_Contact_Detail.Far_Off_Hook_Date_Time,
>                 FACT_Contact_Detail.Near_On_Hook_Date_Time,
>                 FACT_Contact_Detail.Far_On_Hook_Date_Time,
>                 FACT_Contact_Detail.Begin_Greeting_Date_Time,
>
>                 FACT_Contact_Detail.End_Guard_Date_Time,
>                 FACT_Contact_Detail.Resource_Group_Type,
>                 FACT_Contact_Detail.Info_Digits,
>                 FACT_Contact_Detail.Agent_Site_Key,
>                 FACT_Contact_Detail.Agent_Switch_Key,
>
>                 FACT_Contact_Detail.Line_Number,
>                 FACT_Contact_Detail.Queue_Start_Date_Time,
>
>                 FACT_Contact_Detail.CallQ_Start_Date_Time,
>                 FACT_Contact_Detail.Queue_End_Date_Time,
>                 FACT_Contact_Detail.Dur_Queue_Time,
>                 FACT_Contact_Detail.Conn_Clear_Date_Time,
>                 FACT_Contact_Detail.Wrap_End_Date_Time,
>
> FACT_Contact_Detail.Answer_Flag,
>
> FACT_Contact_Detail.Call_End_Date_Time,
>  FACT_Contact_Detail.Script_Id_Source,
>
>   FACT_Contact_Detail.IPNIQ_ASBR_Flag,
>  FACT_Contact_Detail.Reserved_Start_Date_Time,
>
>   FACT_Contact_Detail.Reserved_End_Date_Time,
>  FACT_Contact_Detail.Dur_IPNIQ_Q_Time,
>  FACT_Contact_Detail.SeqLeg_Number,
>
> FACT_Contact_Detail.Num_Accept_Responses,
>  FACT_Contact_Detail.Num_Reject_Responses,
>  FACT_Contact_Detail.Preview_Start_Date_Time,
>
>   FACT_Contact_Detail.Preview_End_Date_Time,
>
> FACT_Contact_Detail.Detection_Date_Time,
>  FACT_Contact_Detail.Answer_Date_Time,
>  FACT_Contact_Detail.Record_Number,
>  FACT_Contact_Detail.Detection_Type,
>  FACT_Contact_Detail.Calling_Party_Name,
>  FACT_Contact_Detail.Num_Phone_Rings,
>
>   FACT_Contact_Detail.End_Greeting_Date_Time,
>  FACT_Contact_Detail.Phone_Start_Ringing_Date_Time,
>
>   FACT_Contact_Detail.Begin_Greeting_Date_Time,
>  FACT_Contact_Detail.Begin_Msg_Playback_Date_Time,
>  FACT_Contact_Detail.Xfer_Command_Date_Time,
>  FACT_Contact_Detail.Connect_Date_Time,
>  FACT_Contact_Detail.Second_Party_Park_Flag,
>  FACT_Contact_Detail.Dur_Short_Call,
>  FACT_Agent_Activity_Detail.Call_Id_Source,
>
>   FACT_Contact_Detail.Call_Category_Key,
>
>   DIM_Agents_AgentEventDetailsSecondParty.User_Id_Source
>
>   FROM
>     hdfs.rootparq.`fact_agent_activity_detail_12m_partparq` AS
> FACT_Agent_Activity_Detail
>
>                 LEFT OUTER JOIN
>
>  (select * from hdfs.rootparq.`fact_contact_detail_12m_partparq` where
> Call_Start_Date_Time
>
> between
>
> '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND
> fact_contact_detail_12m_partparq.dir0 = 2014 and
> fact_contact_detail_12M_partParq.dir1 = 12 ) AS FACT_Contact_Detail
>  ON
>        FACT_Agent_Activity_Detail.Data_Source_Key =
> FACT_Contact_Detail.Data_Source_Key
>                 AND
>        FACT_Agent_Activity_Detail.Call_Id_Source =
> FACT_Contact_Detail.Call_Id_Source
>                 AND
>        FACT_Agent_Activity_Detail.Sequence_Number =
> FACT_Contact_Detail.Sequence_Number
>
>
>                 LEFT OUTER JOIN
>                    DIM_Services_Parq AS DIM_Services_Agent_Activity_Detail
> ON DIM_Services_Agent_Activity_Detail.Service_Key =
> FACT_Agent_Activity_Detail.Service_Key
>
>                 LEFT OUTER JOIN
>     DIM_Services_Parq AS DIM_Services_Contact_Detail ON
> FACT_Contact_Detail.Service_Key = DIM_Services_Contact_Detail.Service_Key
>
>                 LEFT OUTER JOIN
>     DIM_Call_Types_Parq AS DIM_CallTypes_ContactDetails ON
>        FACT_Contact_Detail.Call_Type_Key =
> DIM_CallTypes_ContactDetails.Call_Type_Key
>
>                 LEFT OUTER JOIN
>     DIM_Resource_Groups_parq AS DIM_Resource_Groups_AgentEventDetails ON
>        FACT_Contact_Detail.Resource_Group_Key =
> DIM_Resource_Groups_AgentEventDetails.Resource_Group_Key
>
>
>                 LEFT OUTER JOIN
>     DIM_Call_Actions_parq AS DIM_Call_Actions ON
> DIM_Call_Actions.Call_Action_Key = FACT_Contact_Detail.Call_Action_Key
>
>                 INNER JOIN
>         DIM_Agents_Parquet AS DIM_Agents_AgentEventDetails ON
> FACT_Agent_Activity_Detail.Agent_Key =
> DIM_Agents_AgentEventDetails.Agent_Key
>
>                 INNER JOIN
>         DIM_Sites_Parq ON FACT_Agent_Activity_Detail.Site_Key =
> DIM_Sites_Parq.Site_Key
>
>                 INNER JOIN
>         DIM_Workgroups_parq AS DIM_Workgroups_AgentEventDetails ON
>        FACT_Agent_Activity_Detail.Workgroup_Key =
> DIM_Workgroups_AgentEventDetails.Workgroup_Key
>
>                 LEFT OUTER JOIN
>     DIM_Dispositions_parq AS DIM_Dispositions ON
> FACT_Contact_Detail.Agent_Disp_Key = DIM_Dispositions.Disposion_Key
>
>                 LEFT OUTER JOIN
>     DIM_Agents_Parquet AS DIM_Agents_AgentEventDetailsSecondParty ON
>        FACT_Contact_Detail.Second_Party_Agent_Key =
> DIM_Agents_AgentEventDetailsSecondParty.Agent_Key
>
>                 LEFT OUTER JOIN
>     DIM_Call_Action_Reasons_parq AS DIM_Call_Action_Reasons ON
> FACT_Contact_Detail.Call_Action_Reason_Key =
> DIM_Call_Action_Reasons.Call_Action_Reason_Key
>
>                 LEFT OUTER JOIN
>     DIM_Agent_Status_parq AS DIM_Agent_Status_AgentEventDetails ON
>        FACT_Agent_Activity_Detail.Status_Key =
> DIM_Agent_Status_AgentEventDetails.Agent_Status_Key
>
>                 LEFT OUTER JOIN
>     DIM_Agent_Status_Reasons_parq AS DIM_Agent_Reason_AgentEventDetails ON
>        FACT_Agent_Activity_Detail.Reason_Key =
> DIM_Agent_Reason_AgentEventDetails.Agent_Status_Reason_Key
>
>
> WHERE
>  (
>
> DIM_Workgroups_AgentEventDetails.Workgroup_Key  IN  ( 3,1)
>
>   AND
>
>
>    ( FACT_Agent_Activity_Detail.Start_Date_Time
>
> between
> '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND
> FACT_Agent_Activity_Detail.dir0 = 2014 and FACT_Agent_Activity_Detail.dir1 =
>
> 12
>    )
>
>   );
>
>
> Thanks,
> Latha
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>

Reply via email to