Hi Latha,

Please make sure you have enough free RAM for :
DRILL_MAX_DIRECT_MEMORY + DRILL_MAX_HEAP .

Some other services like MFS may take significant part of OS RAM.

Thanks,
Hao

On Thu, Apr 23, 2015 at 8:24 AM, Sivasubramaniam, Latha <
[email protected]> wrote:

> Hao,
>
> The drillbit that crashed has outofmemory - heap space error - heap memory
> is set to 8GB (total RAM in the node is 32GB, document said to set the heap
> at 8GB). Also, I ran some new small queries, those ran fine. But I one of
> the query that was part of the test again, it showed on the web UI as
> pending for sometime and then disappeared. I will increase memory further
> and run the tests again.
>
>
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2367) ~[na:1.7.0_45]
>         at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
> ~[na:1.7.0_45]
>         at
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
> ~[na:1.7.0_45]
>         at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
> ~[na:1.7.0_45]
>         at java.lang.StringBuffer.append(StringBuffer.java:237)
> ~[na:1.7.0_45]
>         at java.io.StringWriter.write(StringWriter.java:112) ~[na:1.7.0_45]
>         at java.io.PrintWriter.write(PrintWriter.java:456) ~[na:1.7.0_45]
>         at java.io.PrintWriter.write(PrintWriter.java:473) ~[na:1.7.0_45]
>         at
> org.eigenbase.rel.AbstractRelNode$1.explain_(AbstractRelNode.java:386)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at org.eigenbase.rel.RelWriterImpl.done(RelWriterImpl.java:166)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.rel.AbstractRelNode.explain(AbstractRelNode.java:281)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.rel.AbstractRelNode.computeDigest(AbstractRelNode.java:391)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.rel.AbstractRelNode.recomputeDigest(AbstractRelNode.java:323)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.rel.AbstractRelNode.onRegister(AbstractRelNode.java:317)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.relopt.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1464)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.relopt.volcano.VolcanoPlanner.register(VolcanoPlanner.java:837)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.relopt.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:860)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.relopt.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:49)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.rel.AbstractRelNode.onRegister(AbstractRelNode.java:301)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.relopt.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1464)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.relopt.volcano.VolcanoPlanner.register(VolcanoPlanner.java:837)
> ~[optiq-core-0.9-drill-r20.jar:na]
>         at
> org.eigenbase.relopt.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:860)
> ~[optiq-core-0.9-drill-r20.jar:na]
>
> -----Original Message-----
> From: Hao Zhu [mailto:[email protected]]
> Sent: Wednesday, April 22, 2015 6:10 PM
> To: [email protected]
> Subject: Re: Query performance Comparison - Drill and Impala
>
> Any errors from each drillbit log?
> Can you run any simple drill query from any drillbit as of now?
>
>
> On Wed, Apr 22, 2015 at 5:09 PM, Sivasubramaniam, Latha <
> [email protected]> wrote:
>
> > Hi Hao,
> >
> > I am yet to run the tests with the queries you had asked for. I have
> > to get the results for 15 concurrent queries before that.  I am using
> > sqlline, so I had setup 15 terminals ( since running as a background
> > process does not seem to work) and submitted 15 queries. Initial
> > behavior was same as when I ran one query. But then, I don't see any
> > running queries on the web gui on any of the drillbit nodes. One of
> > the drill bit crashed and the rest of them seems to be hanging,
> > nothing in the in the drillbit log for the remaining drillbit nodes.
> >
> > It's been more than 2 hours. What can I check to see if these queries
> > are even being attempted or not?
> >
> > Thanks,
> > Latha
> >
> > -----Original Message-----
> > From: Hao Zhu [mailto:[email protected]]
> > Sent: Tuesday, April 21, 2015 3:27 PM
> > To: [email protected]
> > Subject: Re: Query performance Comparison - Drill and Impala
> >
> > Correction for item 1: Change "FAA" => "FCD1".
> >
> >
> > On Tue, Apr 21, 2015 at 3:11 PM, Hao Zhu <[email protected]> wrote:
> >
> > > Hi Team,
> > >
> > > Besides the 300+ seconds planning time, here are the performance
> > > differences in execution phase also.
> > >
> > > In general, this SQL contains 2 fact table joins, and with another
> > > 10+ dimension table joins.
> > > The 2 fact tables are:
> > >
> > >    - fact_agent_activity_detail_12m_partparq AS FAA
> > >    - fact_contact_detail_12m_partparq AS FCD1
> > >
> > > So here are the 2 major time differences:
> > > *1. PARQUET_ROW_GROUP_SCAN for FAA.* Impala spent 3s248ms to read
> > > FAA.
> > > Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows
> > > Peak Mem  Est. Peak Mem  Detail
> > > 15:HASH JOIN           4    2s728ms    2s774ms  145.81K      89.61K
> > >  531.03 MB        2.68 KB  RIGHT OUTER JOIN, PARTITIONED
> > > |--30:EXCHANGE         4    3.135ms   12.277ms  145.81K         192
> > >    0              0  HASH(FAA.Data_Source_Key,FA...
> > > |  00:SCAN HDFS        1    3s248ms    3s248ms  145.81K         192
> > >  107.93 MB      880.00 MB  default.fact_agent_activity...
> > > 29:EXCHANGE            4      9.8ms    9.826ms  399.94K      89.61K
> > >    0              0  HASH(FCD1.data_source_key,F...
> > > 01:SCAN HDFS           4    2s648ms    3s452ms  399.94K      89.61K
> > > 55.96 MB      960.00 MB  default.fact_contact_detail...
> > >
> > > Drill spent 26.934 seconds(Avg Process time) to read FAA.
> > > 02-xx-05 PARQUET_ROW_GROUP_SCAN 26.934
> > > 02-05       Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> > > [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_0_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_10_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_11_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_1_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_2_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_3_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_4_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_5_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_6_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_7_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_8_0.parquet],
> > > ReadEntryWithPath [path=hdfs://
> > > 10.153.25.119:8020/user/root/fact_contact_detail_12m_partparq/2014/1
> > > 2/
> > > 1_9_0.parquet]
> > > <http://10.153.25.119:8020/user/root/fact_contact_detail_12m_partpar
> > > q/
> > > 2014/12/1_9_0.parquet%5D>],
> > > selectionRoot=/user/root/fact_contact_detail_12m_partparq,
> > > numFiles=12,
> > > columns=[`*`]]]) : rowType = (DrillRecordRow[*,
> > > Call_Start_Date_Time, dir0,
> > > dir1]): rowcount = 8960530.0, cumulative cost = {8960530.0 rows,
> > > 3.584212E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 606703
> > >
> > > Could you compare the performance of below 2 SQLs:
> > > * Impala:*
> > >  select count(*) from
> > >  (select * from fact_contact_detail_12m_partparq AS FCD1 where
> > > Call_Start_Date_Time between '2014-12-03 00:00:00' AND '2014-12-03
> > > 13:00:00' AND FCD1.Year_key = 20140000 and FCD1.month_key =
> > > 20141200) tmp;
> > > * Drill:*
> > >  select count(*) from
> > >  (select * from fact_contact_detail_12m_partparq AS FCD1 where
> > > Call_Start_Date_Time between '2014-12-03 00:00:00' AND '2014-12-03
> > > 13:00:00' AND FCD1.dir0 = 2014 and FCD1.dir1 = 12) tmp;
> > >
> > > *2. Drill is using broadcast join for the 2 fact table while impala
> > > is using shuffle join.*
> > >
> > > Impala's shuffle join for 2 fact tables took 2s728ms.
> > > 15:HASH JOIN           4    2s728ms    2s774ms  145.81K      89.61K
> > >  531.03 MB        2.68 KB  RIGHT OUTER JOIN, PARTITIONED
> > >
> > > Drill's broadcast join for 2 fact tables caused 16.035 seconds avg
> > > waiting time.
> > > 01-xx-46 UNORDERED_RECEIVER 16.035
> > >
> > > 01-46    BroadcastExchange : rowType = RecordType(ANY $f1, ANY $f2, ANY
> > > $f3, ANY ITEM, ANY ITEM4, ANY ITEM5, ANY ITEM6, ANY ITEM7, ANY
> > > ITEM8, ANY ITEM9, ANY ITEM10, ANY ITEM11, ANY ITEM12, ANY ITEM13,
> > > ANY ITEM14, ANY
> > > ITEM15): rowcount = 50402.98125, cumulative cost =
> > > {2.7032798943749998E7 rows, 1.0056522825625E8 cpu, 0.0 io,
> > > 1.32128391168E10 network, 0.0 memory}, id = 606708
> > >
> > > The reason why Drill chose broadcast is because it estimates row
> > > count is only 50K(Below planner.broadcast_threshold, default 1000K).
> > > However the actual row count to broadcast is much higher than
> estimation.
> > >
> > > Needs to check why the estimation is not that accurate in this case.
> > >
> > > Thanks,
> > > Hao
> > >
> > > On Tue, Apr 21, 2015 at 2:32 PM, Jinfeng Ni <[email protected]> wrote:
> > >
> > >> The query you used is a 15-table join query. We know that Drill's
> > >> cost based optimizer will see performance overhead increase
> > >> significantly with increased # of tables joined, due to the
> > >> increased search space.  I'm not surprised to see that you had
> > >> 306.812 seconds for planning for such 15 table join.
> > >>
> > >> We plan to implement a heuristic planner in Drill, which will
> > >> provide another option, for large # of table join query.  Heuristic
> > >> planner should help in this case.
> > >>
> > >>
> > >> On Tue, Apr 21, 2015 at 1:48 PM, Sivasubramaniam, Latha <
> > >> [email protected]> wrote:
> > >>
> > >> > Hao,
> > >> >
> > >> > I have copied both query profiles to the link
> > >> >
> > >> https://drive.google.com/folderview?id=0ByB1-EsAGxA8fkRCSDhVcGtQS0N
> > >> GT jRuSlpGelVLMkxIVVFxYXMtU2JtQ3FaN2t3UTZpUUE&usp=sharing
> > >> >
> > >> > Please let me know if you cannot access.
> > >> >
> > >> > Appreciate any help.
> > >> >
> > >> > Thanks,
> > >> > Latha
> > >> >
> > >> > -----Original Message-----
> > >> > From: Hao Zhu [mailto:[email protected]]
> > >> > Sent: Tuesday, April 21, 2015 11:31 AM
> > >> > To: [email protected]
> > >> > Subject: Re: Query performance Comparison - Drill and Impala
> > >> >
> > >> > Hi Latha,
> > >> >
> > >> > Do you have the complete SQL profiles for the same query on
> > >> > impala and Drill?
> > >> > For Impala, you can run "profile;" command after the SQL finished.
> > >> > For Drill, you can go to the web GUI, and copy/paste the complete
> > >> > "Full JSON Profile".
> > >> >
> > >> > I want to compare what is the major performance difference
> > >> > between them two.
> > >> >
> > >> > Thanks,
> > >> > Hao
> > >> >
> > >> > On Tue, Apr 21, 2015 at 10:43 AM, Sivasubramaniam, Latha <
> > >> > [email protected]> wrote:
> > >> >
> > >> > > Neeraja, I removed some columns from select list and shortened
> > >> > > aliases to get the query manageable and got the explain plan
> > working finally.
> > >> > > The errors were really random.
> > >> > >
> > >> > > Jacques,
> > >> > >
> > >> > > Most of it seems to be Hashjoin and yes the planning time took
> > >> > > 306 seconds.  Is there a way to improve the query performance.?
> > >> > > Below is the snapshot of what I see in the query plan and I
> > >> > > have copied the query also to the end of the email.
> > >> > >
> > >> > >
> > >> > > Resource_Group_Desc=[$7], Call_Action_Desc=[$20], ITEM7=[$9],
> > >> > > ITEM8=[$10], ITEM9=[$11], ITEM10=[$12], ITEM11=[$13],
> > >> > > ITEM12=[$14], ITEM13=[$15], ITEM0=[$16], ITEM14=[$17],
> > >> > > ITEM15=[$18])
> > >> > > 01-29
> > >> > > HashJoin(condition=[=($19, $8)], joinType=[left])
> > >> > > 01-31
> > >> > > Project(Call_Id_Source=[$0], Agent_Key=[$1], Site_Key=[$2],
> > >> > > Workgroup_Key=[$3], Status_Key=[$4], Reason_Key=[$5],
> > >> > > Call_Type_Desc=[$6], Resource_Group_Desc=[$20], ITEM6=[$8],
> > >> > > ITEM7=[$9], ITEM8=[$10], ITEM9=[$11], ITEM10=[$12],
> > >> > > ITEM11=[$13], ITEM12=[$14], ITEM13=[$15], ITEM0=[$16],
> > >> > > ITEM14=[$17],
> > >> > > ITEM15=[$18])
> > >> > > 01-32
> > >> > > HashJoin(condition=[=($7, $19)], joinType=[left])
> > >> > > 01-34
> > >> > > Project(Call_Id_Source=[$0], Agent_Key=[$1], Site_Key=[$2],
> > >> > > Workgroup_Key=[$3], Status_Key=[$4], Reason_Key=[$5],
> > >> > > Call_Type_Desc=[$20], ITEM5=[$7], ITEM6=[$8], ITEM7=[$9],
> > >> > > ITEM8=[$10], ITEM9=[$11], ITEM10=[$12], ITEM11=[$13],
> > >> > > ITEM12=[$14], ITEM13=[$15], ITEM0=[$16], ITEM14=[$17],
> > >> > > ITEM15=[$18])
> > >> > > 01-35
> > >> > > HashJoin(condition=[=($6, $19)], joinType=[left])
> > >> > > 01-37
> > >> > > Project(Call_Id_Source=[$0], Agent_Key=[$1], Site_Key=[$2],
> > >> > > Workgroup_Key=[$3], Status_Key=[$4], Reason_Key=[$5],
> > >> > > ITEM4=[$7], ITEM5=[$8], ITEM6=[$9], ITEM7=[$10], ITEM8=[$11],
> > >> > > ITEM9=[$12], ITEM10=[$13], ITEM11=[$14], ITEM12=[$15],
> > >> > > ITEM13=[$16], ITEM0=[$17], ITEM14=[$18], ITEM15=[$19])
> > >> > > 01-38
> > >> > > HashJoin(condition=[=($6, $20)], joinType=[left])
> > >> > > 01-40
> > >> > > Project(Call_Id_Source=[$0], Agent_Key=[$2], Site_Key=[$3],
> > >> > > Workgroup_Key=[$4], Status_Key=[$5], Reason_Key=[$6],
> > >> > > ITEM=[$7], ITEM4=[$8], ITEM5=[$9], ITEM6=[$10], ITEM7=[$11],
> > >> > > ITEM8=[$12], ITEM9=[$13], ITEM10=[$14], ITEM11=[$15],
> > >> > > ITEM12=[$16], ITEM13=[$17], ITEM0=[$7], ITEM14=[$18],
> > >> > > ITEM15=[$19])
> > >> > > 01-41
> > >> > > HashJoin(condition=[=($20, $1)], joinType=[left])
> > >> > > 01-43
> > >> > > Project(Call_Id_Source=[$1], Service_Key=[$3], Agent_Key=[$4],
> > >> > > Site_Key=[$5], Workgroup_Key=[$6], Status_Key=[$7],
> > >> > > Reason_Key=[$8], ITEM=[$15], ITEM4=[$16], ITEM5=[$17],
> > >> > > ITEM6=[$18], ITEM7=[$19], ITEM8=[$20], ITEM9=[$21],
> > >> > > ITEM10=[$22], ITEM11=[$23], ITEM12=[$24], ITEM13=[$25],
> > >> > > ITEM14=[$26],
> > >> > > ITEM15=[$27])
> > >> > > 01-45
> > >> > > HashJoin(condition=[AND(=($0, $12), =($1, $13), =($2, $14))],
> > >> > > joinType=[left])
> > >> > > 01-47
> > >> > > SelectionVectorRemover
> > >> > > 01-48
> > >> > > Filter(condition=[AND(>=($9, '2014-12-03 00:00:00'), <=($9,
> > >> > > '2014-12-03 13:00:00'), =($10, 2014), =($11, 12))])
> > >> > > 01-49
> > >> > > Project(Data_Source_Key=[$9], Call_Id_Source=[$4],
> > >> > > Sequence_Number=[$8], Service_Key=[$11], Agent_Key=[$10],
> > >> > > Site_Key=[$2], Workgroup_Key=[$5], Status_Key=[$3],
> > >> > > Reason_Key=[$7], Start_Date_Time=[$6], dir0=[$1],
> > >> > > dir1=[$0])
> > >> > > 01-50
> > >> > >   Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> > >> > > [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_0_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_1_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_2_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_3_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_4_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_5_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_6_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_7_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_8_0.parquet],
> > >> > > ReadEntryWithPath [path=hdfs://
> > >> > >
> > >> 10.153.25.119:8020/user/root/fact_agent_activity_detail_12m_partpar
> > >> q/
> > >> 2
> > >> > > 014/12/1_9_0.parquet]],
> > >> > > selectionRoot=/user/root/fact_agent_activity_detail_12m_partpar
> > >> > > q, numFiles=10, columns=[`Data_Source_Key`, `Call_Id_Source`,
> > >> > > `Sequence_Number`, `Service_Key`, `Agent_Key`, `Site_Key`,
> > >> > > `Workgroup_Key`, `Status_Key`, `Reason_Key`, `Start_Date_Time`,
> > >> > > `dir0`, `dir1`]]])
> > >> > > 01-46
> > >> > > BroadcastExchange
> > >> > > 02-01
> > >> > > Project($f1=[ITEM($0, 'Data_Source_Key')], $f2=[ITEM($0,
> > >> > > 'Call_Id_Source')], $f3=[ITEM($0, 'Sequence_Number')],
> > >> > > ITEM=[ITEM($0, 'Service_Key')], ITEM4=[ITEM($0,
> > >> > > 'Call_Type_Key')], ITEM5=[ITEM($0, 'Resource_Group_Key')],
> > >> > > ITEM6=[ITEM($0, 'Call_Action_Key')], ITEM7=[ITEM($0,
> > >> > > 'Agent_Disp_Key')], ITEM8=[ITEM($0, 'Second_Party_Agent_Key')],
> > >> > > ITEM9=[ITEM($0, 'Call_Action_Reason_Key')], ITEM10=[ITEM($0,
> > >> > > 'ANI')], ITEM11=[ITEM($0, 'DNIS')], ITEM12=[ITEM($0,
> > >> > > 'Begin_Date_Time')], ITEM13=[ITEM($0, 'Sequence_Number')],
> > >> > > ITEM14=[ITEM($0, 'Dur_Short_Call')], ITEM15=[ITEM($0,
> > >> > > 'Call_Category_Key')])
> > >> > > 02-02
> > >> > > SelectionVectorRemover
> > >> > > 02-03
> > >> > >   Filter(condition=[AND(>=($1, '2014-12-03 00:00:00'), <=($1,
> > >> > > '2014-12-03 13:00:00'), =($2, 2014), =($3, 12))])
> > >> > > 02-04
> > >> > >     Project(T8¦¦*=[$0], Call_Start_Da |
> > >> > > +------------+------------+
> > >> > > 1 row selected (306.812 seconds)
> > >> > > 0: jdbc:drill:zk=rtr-poc-imp1:2181>
> > >> > >
> > >> > >
> > >> > > ************ Query ******************** SELECT DSQ.Site_Desc,
> > >> > > DWAE.Workgroup_Desc, DASAE.Agent_Status_Key,
> > >> > > DCTCD.Call_Type_Desc, DCA.Call_Action_Desc, FCD.ANI, FCD.DNIS,
> > >> > > FCD.Begin_Date_Time, DD.Disposition_Desc, DAAE.Full_Name ,
> > >> > >
> > >> > > DRGAE.Resource_Group_Desc, FCD.Sequence_Number,
> > >> > > FCD.Service_Key, FCD.Dur_Short_Call, FAA.Call_Id_Source,
> > >> > > FCD.Call_Category_Key FROM
> > >> > > fact_agent_activity_detail_12m_partparq AS FAA LEFT OUTER JOIN
> > >> > > (select
> > >> > > * from fact_contact_detail_12m_partparq AS FCD1 where
> > >> > > Call_Start_Date_Time between '2014-12-03 00:00:00' AND
> > >> > > '2014-12-03 13:00:00' AND FCD1.dir0 = 2014 and FCD1.dir1 = 12
> > >> > >
> > >> > > ) AS FCD
> > >> > > ON FAA.Data_Source_Key = FCD.Data_Source_Key AND
> > >> > > FAA.Call_Id_Source = FCD.Call_Id_Source AND FAA.Sequence_Number
> > >> > > = FCD.Sequence_Number LEFT OUTER JOIN DIM_Services_Parq AS
> > >> > > DSAAD ON DSAAD.Service_Key = FAA.Service_Key LEFT OUTER JOIN
> > >> > > DIM_Services_Parq AS DSCD ON FCD.Service_Key = DSCD.Service_Key
> > >> > > LEFT OUTER JOIN DIM_Call_Types_Parq AS DCTCD ON
> > >> > > FCD.Call_Type_Key = DCTCD.Call_Type_Key LEFT OUTER JOIN
> > >> > > DIM_Resource_Groups_parq AS DRGAE ON FCD.Resource_Group_Key =
> > >> > > DRGAE.Resource_Group_Key LEFT OUTER JOIN DIM_Call_Actions_parq
> > >> > > AS DCA ON DCA.Call_Action_Key = FCD.Call_Action_Key INNER JOIN
> > >> > > DIM_Agents_Parquet AS DAAE ON FAA.Agent_Key = DAAE.Agent_Key
> > >> > > INNER JOIN DIM_Sites_Parq AS DSQ ON FAA.Site_Key = DSQ.Site_Key
> > >> > > INNER JOIN DIM_Workgroups_parq AS
> > DWAE ON FAA.Workgroup_Key = DWAE.Workgroup_Key
> > >> > > LEFT OUTER JOIN    DIM_Dispositions_parq AS DD ON
> > FCD.Agent_Disp_Key =
> > >> > > DD.Disposion_Key
> > >> > > LEFT OUTER JOIN DIM_Agents_Parquet AS DACD ON
> > >> > > FCD.Second_Party_Agent_Key = DACD.Agent_Key LEFT OUTER JOIN
> > >> > > DIM_Call_Action_Reasons_parq AS DAAR ON
> > >> > > FCD.Call_Action_Reason_Key = DAAR.Call_Action_Reason_Key LEFT
> > >> > > OUTER JOIN DIM_Agent_Status_parq AS DASAE ON FAA.Status_Key =
> > >> > > DASAE.Agent_Status_Key LEFT OUTER JOIN
> > >> > > DIM_Agent_Status_Reasons_parq AS DARAE ON FAA.Reason_Key =
> > >> > > DARAE.Agent_Status_Reason_Key WHERE (DWAE.Workgroup_Key IN (
> > >> > > 3,1)
> > >> > >
> > >> > >  AND
> > >> > > (FAA.Start_Date_Time between '2014-12-03 00:00:00' AND
> > >> > > '2014-12-03 13:00:00' AND FAA.dir0 = 2014 and FAA.dir1 = 12));
> > >> > >
> > >> > >
> > >> > > Appreciate your help.
> > >> > >
> > >> > > Thanks,
> > >> > > Latha
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > -----Original Message-----
> > >> > > From: Neeraja Rentachintala
> > >> > > [mailto:[email protected]]
> > >> > > Sent: Monday, April 20, 2015 9:22 PM
> > >> > > To: [email protected]
> > >> > > Subject: Re: Query performance Comparison - Drill and Impala
> > >> > >
> > >> > > Latha
> > >> > > What error are you getting when doing explain plan.
> > >> > >
> > >> > > On Mon, Apr 20, 2015 at 4:40 PM, Sivasubramaniam, Latha <
> > >> > > [email protected]> wrote:
> > >> > >
> > >> > > > I am unable to do an explain plan for this query.  Is there a
> > >> > > > way to pass a sql file for the Explain plan command. Query
> > >> > > > runs fine from the file with the !run command though. Will
> > >> > > > try to fix the errors and see if I can get the query plan.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > latha
> > >> > > >
> > >> > > > -----Original Message-----
> > >> > > > From: Jacques Nadeau [mailto:[email protected]]
> > >> > > > Sent: Monday, April 20, 2015 4:12 PM
> > >> > > > To: [email protected]
> > >> > > > Subject: Re: Query performance Comparison - Drill and Impala
> > >> > > >
> > >> > > > Can you confirm you are getting broadcast joins for all the
> > >> > > > dimension tables? Additionally, can you confirm the planning
> time?
> > >> > > > I'm guessing both might be an issue given the times you are
> > seeing.
> > >> > > > On Apr 20, 2015 6:46 PM, "Sivasubramaniam, Latha" <
> > >> > > > [email protected]> wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > I am running performance tests for our use cases on Impala
> > >> > > > > and Drill to make a choice and pick one for our Data
> > >> > > > > warehouse implementation on top of Hadoop.  I used the same
> > >> > > > > cluster for both
> > >> > the tests.
> > >> > > > > Select Query that takes ~20 seconds in impala, takes around
> > >> > > > > 440 seconds in drill on same hardware cluster (nearly 20
> > >> > > > > times
> > more).
> > >> > > > > I would like to make sure that I am not missing anything in
> > >> > > > > tuning or
> > >> > > configuration.
> > >> > > > > Below is our configuration, appreciate comments on if these
> > >> > > > > performance numbers are as expected or if any tuning can be
> > >> > > > > done to
> > >> > > > improve drill performance.
> > >> > > > >
> > >> > > > > Hadoop Cluster configuration and data set:
> > >> > > > >
> > >> > > > >
> > >> > > > > *         5 nodes cluster, 1 name node and 4 data nodes, 32GB
> > >> memory
> > >> > ,
> > >> > > 4
> > >> > > > > cpus on each node.
> > >> > > > >
> > >> > > > > *         There are two main fact tables. 98 million records
> in
> > >> one
> > >> > and
> > >> > > > 87
> > >> > > > > million records in another. There are around 10 dimension
> > tables.
> > >> > > > > For now, we are running the tests with normalized schema
> > >> > > > > from out data
> > >> > > > warehouse.
> > >> > > > >
> > >> > > > > *         Running the same query against Impala and Drill
> except
> > >> for
> > >> > > the
> > >> > > > > syntax differences everything is same.
> > >> > > > >
> > >> > > > > Impala Configuration
> > >> > > > >
> > >> > > > >
> > >> > > > > *         CDH 5.3.2 with Impala 2.1.2 version
> > >> > > > >
> > >> > > > > *         4 impala daemons running on 4 data nodes.
> > >> > > > >
> > >> > > > > *          Impala daemons memory set to 25GB.
> > >> > > > >
> > >> > > > > *         Partitioned parquet tables were created from text
> > format
> > >> > hdfs
> > >> > > > > files using impala itself (CTAS command).
> > >> > > > >
> > >> > > > > *         Parquet block size 256MB
> > >> > > > >
> > >> > > > > Drill Configuration
> > >> > > > >
> > >> > > > >
> > >> > > > > *         Drill 0.8 version, added drill on the CDH cluster
> > setup
> > >> > > > >
> > >> > > > > *         4 drillbits running on 4 data nodes
> > >> > > > >
> > >> > > > > *         Drillbits memory set to 16GB and heap memory set to
> > >> 8GB.  I
> > >> > > ran
> > >> > > > > with 8GB memory and 4GB for heap, there was no change in
> > >> > > > > the query
> > >> > > > timings.
> > >> > > > >
> > >> > > > > *         Partitioned parquet files.
> > >> > > > >
> > >> > > > > SQL Query
> > >> > > > >
> > >> > > > > SELECT
> > >> > > > >
> > >> > > > >                 DIM_Sites_parq.Site_Desc,
> > >> > > > >
> > >> > > > >
>  DIM_Workgroups_AgentEventDetails.Workgroup_Desc,
> > >> > > > >                 DIM_Agents_AgentEventDetails.Last_Name,
> > >> > > > >
> > >> > > > >                 DIM_Agents_AgentEventDetails.First_Name,
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >>  DIM_Agent_Status_AgentEventDetails.Agent_Status_Key,
> > >> > > > >
> > >> > > > > DIM_CallTypes_ContactDetails.Call_Type_Desc,
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >                 DIM_Call_Actions.Call_Action_Desc,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.ANI,
> > >> > > > >                 FACT_Contact_Detail.DNIS,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.Begin_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Call_Start_Date_Time,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.Orig_DNIS,
> > >> > > > >                 FACT_Agent_Activity_Detail.Start_Date_Time,
> > >> > > > >                 FACT_Agent_Activity_Detail.End_Date_Time,
> > >> > > > >                 FACT_Agent_Activity_Detail.Dur_Time,
> > >> > > > >
> > >> > > > >                 FACT_Agent_Activity_Detail.Sequence_Number,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.Begin_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Call_Start_Date_Time,
> > >> > > > >                 FACT_Agent_Activity_Detail.Start_Date_Time,
> > >> > > > >                 FACT_Agent_Activity_Detail.End_Date_Time,
> > >> > > > >
> > >> > > > >
> > >> > > > >                 DIM_Dispositions.Disposition_Desc,
> > >> > > > >                 DIM_Agents_AgentEventDetails.Full_Name ,
> > >> > > > >
> > >> > > > >
> > >> > > > > DIM_Resource_Groups_AgentEventDetails.Resource_Group_Desc,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.Sequence_Number,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.Service_Key,
> > >> > > > >                 FACT_Contact_Detail.Station,
> > >> > > > >                 FACT_Contact_Detail.Near_Off_Hook_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Far_Off_Hook_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Near_On_Hook_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Far_On_Hook_Date_Time,
> > >> > > > >
> > >> > > > > FACT_Contact_Detail.Begin_Greeting_Date_Time,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.End_Guard_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Resource_Group_Type,
> > >> > > > >                 FACT_Contact_Detail.Info_Digits,
> > >> > > > >                 FACT_Contact_Detail.Agent_Site_Key,
> > >> > > > >                 FACT_Contact_Detail.Agent_Switch_Key,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.Line_Number,
> > >> > > > >                 FACT_Contact_Detail.Queue_Start_Date_Time,
> > >> > > > >
> > >> > > > >                 FACT_Contact_Detail.CallQ_Start_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Queue_End_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Dur_Queue_Time,
> > >> > > > >                 FACT_Contact_Detail.Conn_Clear_Date_Time,
> > >> > > > >                 FACT_Contact_Detail.Wrap_End_Date_Time,
> > >> > > > >
> > >> > > > > FACT_Contact_Detail.Answer_Flag,
> > >> > > > >
> > >> > > > > FACT_Contact_Detail.Call_End_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Script_Id_Source,
> > >> > > > >
> > >> > > > >   FACT_Contact_Detail.IPNIQ_ASBR_Flag,
> > >> > > > >  FACT_Contact_Detail.Reserved_Start_Date_Time,
> > >> > > > >
> > >> > > > >   FACT_Contact_Detail.Reserved_End_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Dur_IPNIQ_Q_Time,
> > >> > > > >  FACT_Contact_Detail.SeqLeg_Number,
> > >> > > > >
> > >> > > > > FACT_Contact_Detail.Num_Accept_Responses,
> > >> > > > >  FACT_Contact_Detail.Num_Reject_Responses,
> > >> > > > >  FACT_Contact_Detail.Preview_Start_Date_Time,
> > >> > > > >
> > >> > > > >   FACT_Contact_Detail.Preview_End_Date_Time,
> > >> > > > >
> > >> > > > > FACT_Contact_Detail.Detection_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Answer_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Record_Number,
> > >> > > > >  FACT_Contact_Detail.Detection_Type,
> > >> > > > >  FACT_Contact_Detail.Calling_Party_Name,
> > >> > > > >  FACT_Contact_Detail.Num_Phone_Rings,
> > >> > > > >
> > >> > > > >   FACT_Contact_Detail.End_Greeting_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Phone_Start_Ringing_Date_Time,
> > >> > > > >
> > >> > > > >   FACT_Contact_Detail.Begin_Greeting_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Begin_Msg_Playback_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Xfer_Command_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Connect_Date_Time,
> > >> > > > >  FACT_Contact_Detail.Second_Party_Park_Flag,
> > >> > > > >  FACT_Contact_Detail.Dur_Short_Call,
> > >> > > > >  FACT_Agent_Activity_Detail.Call_Id_Source,
> > >> > > > >
> > >> > > > >   FACT_Contact_Detail.Call_Category_Key,
> > >> > > > >
> > >> > > > >   DIM_Agents_AgentEventDetailsSecondParty.User_Id_Source
> > >> > > > >
> > >> > > > >   FROM
> > >> > > > >     hdfs.rootparq.`fact_agent_activity_detail_12m_partparq`
> > >> > > > > AS FACT_Agent_Activity_Detail
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >
> > >> > > > >  (select * from
> > >> > > > > hdfs.rootparq.`fact_contact_detail_12m_partparq`
> > >> > > > > where Call_Start_Date_Time
> > >> > > > >
> > >> > > > > between
> > >> > > > >
> > >> > > > > '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND
> > >> > > > > fact_contact_detail_12m_partparq.dir0 = 2014 and
> > >> > > > > fact_contact_detail_12M_partParq.dir1 = 12 ) AS
> > >> > > > > FACT_Contact_Detail ON
> > >> > > > >        FACT_Agent_Activity_Detail.Data_Source_Key =
> > >> > > > > FACT_Contact_Detail.Data_Source_Key
> > >> > > > >                 AND
> > >> > > > >        FACT_Agent_Activity_Detail.Call_Id_Source =
> > >> > > > > FACT_Contact_Detail.Call_Id_Source
> > >> > > > >                 AND
> > >> > > > >        FACT_Agent_Activity_Detail.Sequence_Number =
> > >> > > > > FACT_Contact_Detail.Sequence_Number
> > >> > > > >
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >                    DIM_Services_Parq AS
> > >> > > > > DIM_Services_Agent_Activity_Detail
> > >> > > > > ON DIM_Services_Agent_Activity_Detail.Service_Key =
> > >> > > > > FACT_Agent_Activity_Detail.Service_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Services_Parq AS DIM_Services_Contact_Detail ON
> > >> > > > > FACT_Contact_Detail.Service_Key =
> > >> > > > > DIM_Services_Contact_Detail.Service_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Call_Types_Parq AS DIM_CallTypes_ContactDetails ON
> > >> > > > >        FACT_Contact_Detail.Call_Type_Key =
> > >> > > > > DIM_CallTypes_ContactDetails.Call_Type_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Resource_Groups_parq AS
> > >> > > > > DIM_Resource_Groups_AgentEventDetails
> > >> > > ON
> > >> > > > >        FACT_Contact_Detail.Resource_Group_Key =
> > >> > > > > DIM_Resource_Groups_AgentEventDetails.Resource_Group_Key
> > >> > > > >
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Call_Actions_parq AS DIM_Call_Actions ON
> > >> > > > > DIM_Call_Actions.Call_Action_Key =
> > >> > > > > FACT_Contact_Detail.Call_Action_Key
> > >> > > > >
> > >> > > > >                 INNER JOIN
> > >> > > > >         DIM_Agents_Parquet AS DIM_Agents_AgentEventDetails
> > >> > > > > ON FACT_Agent_Activity_Detail.Agent_Key =
> > >> > > > > DIM_Agents_AgentEventDetails.Agent_Key
> > >> > > > >
> > >> > > > >                 INNER JOIN
> > >> > > > >         DIM_Sites_Parq ON
> > >> > > > > FACT_Agent_Activity_Detail.Site_Key
> > >> > > > > = DIM_Sites_Parq.Site_Key
> > >> > > > >
> > >> > > > >                 INNER JOIN
> > >> > > > >         DIM_Workgroups_parq AS
> > >> > > > > DIM_Workgroups_AgentEventDetails
> > ON
> > >> > > > >        FACT_Agent_Activity_Detail.Workgroup_Key =
> > >> > > > > DIM_Workgroups_AgentEventDetails.Workgroup_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Dispositions_parq AS DIM_Dispositions ON
> > >> > > > > FACT_Contact_Detail.Agent_Disp_Key =
> > >> > > > > DIM_Dispositions.Disposion_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Agents_Parquet AS
> > >> > > > > DIM_Agents_AgentEventDetailsSecondParty
> > >> ON
> > >> > > > >        FACT_Contact_Detail.Second_Party_Agent_Key =
> > >> > > > > DIM_Agents_AgentEventDetailsSecondParty.Agent_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Call_Action_Reasons_parq AS DIM_Call_Action_Reasons
> > >> > > > > ON FACT_Contact_Detail.Call_Action_Reason_Key =
> > >> > > > > DIM_Call_Action_Reasons.Call_Action_Reason_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Agent_Status_parq AS
> > >> > > > > DIM_Agent_Status_AgentEventDetails
> > ON
> > >> > > > >        FACT_Agent_Activity_Detail.Status_Key =
> > >> > > > > DIM_Agent_Status_AgentEventDetails.Agent_Status_Key
> > >> > > > >
> > >> > > > >                 LEFT OUTER JOIN
> > >> > > > >     DIM_Agent_Status_Reasons_parq AS
> > >> > > > > DIM_Agent_Reason_AgentEventDetails
> > >> > > > ON
> > >> > > > >        FACT_Agent_Activity_Detail.Reason_Key =
> > >> > > > > DIM_Agent_Reason_AgentEventDetails.Agent_Status_Reason_Key
> > >> > > > >
> > >> > > > >
> > >> > > > > WHERE
> > >> > > > >  (
> > >> > > > >
> > >> > > > > DIM_Workgroups_AgentEventDetails.Workgroup_Key  IN  ( 3,1)
> > >> > > > >
> > >> > > > >   AND
> > >> > > > >
> > >> > > > >
> > >> > > > >    ( FACT_Agent_Activity_Detail.Start_Date_Time
> > >> > > > >
> > >> > > > > between
> > >> > > > > '2014-12-03 00:00:00' AND '2014-12-03 13:00:00' AND
> > >> > > > > FACT_Agent_Activity_Detail.dir0 = 2014 and
> > >> > > > > FACT_Agent_Activity_Detail.dir1 =
> > >> > > > >
> > >> > > > > 12
> > >> > > > >    )
> > >> > > > >
> > >> > > > >   );
> > >> > > > >
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Latha
> > >> > > > >
> > >> > > > >
> > >> > > > > This email (including any attachments) is proprietary to
> > >> > > > > Aspect Software, Inc. and may contain information that is
> > confidential.
> > >> > > > > If you have received this message in error, please do not
> > >> > > > > read, copy or
> > >> > > > forward this message.
> > >> > > > > Please notify the sender immediately, delete it from your
> > >> > > > > system and destroy any copies. You may not further disclose
> > >> > > > > or distribute this email or its attachments.
> > >> > > > >
> > >> > > > This email (including any attachments) is proprietary to
> > >> > > > Aspect Software, Inc. and may contain information that is
> > >> > > > confidential. If you have received this message in error,
> > >> > > > please do not read, copy or
> > >> > > forward this message.
> > >> > > > Please notify the sender immediately, delete it from your
> > >> > > > system and destroy any copies. You may not further disclose
> > >> > > > or distribute this email or its attachments.
> > >> > > >
> > >> > > This email (including any attachments) is proprietary to Aspect
> > >> > > Software, Inc. and may contain information that is confidential.
> > >> > > If you have received this message in error, please do not read,
> > >> > > copy or
> > >> > forward this message.
> > >> > > Please notify the sender immediately, delete it from your
> > >> > > system and destroy any copies. You may not further disclose or
> > >> > > distribute this email or its attachments.
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to