Re: Re: Re: Slow Query Performance With 'WHERE' Clause

2018-10-24 Thread Shrikant Bang
Thank you Ma, for response and making me correct about query filters( WHERE
clause) in Kylin.

I got a an idea that Kylin is using HBase FuzzyFilters

while
scanning rows from HBase at coprocessor end.

This also clarified me  importance of order of the dimensions in the
rowkey.

Regards,
Shrikant Bang.


On Wed, Oct 24, 2018 at 7:56 AM Ma Gang  wrote:

> Hi Shrikant,
> Which Kylin version you used? By default, most of the filters will be
> pushed down to coprocessor, you may grep "Total filtered row:" in your
> log to see how many rows are filtered from coprocessor side.
>
> If you want to change code to add some log there, you can add log at
> method: CubeHBaseEndpointRPC.getGTScanner(final GTScanRequest scanRequest)
> the parameter GTScanRequest has a field:filterPushDown, is the filter
> push down to coprocessor, you may log some information there, but the
> filter is not the original filter you input but the converted filter that
> has constant values encoded.
>
>
> At 2018-10-24 01:00:15, "Shrikant Bang"  wrote:
>
> Hi Team,
>
>   I tried debugging code to understand query execution (on Kylin v2.5
> hbase 1.x) . As I understood query is formed and executed using Calcite
> APIs.
>
>   Could you please help me to get answer of below?
>
>   1.  With my understanding default implementation of filter operation
> (WHERE clause) is running on Kylin’s node. Is it correct?
>
>   2.  How to make use of CoprocessorFilter
> 
> for running filters on HBase side (if it is not enabled by default)?
>
>
> Thank You,
> Shrikant Bang
>
> On Tue, Oct 23, 2018 at 4:33 PM Shrikant Bang 
> wrote:
>
>> Thank you Ma for response.
>>
>> I didn't find any logs about applying filters in co-processors. Also I
>> don't see logs for logical/physical query plans in kylin.log. Am I missing
>> anything?
>>
>> I would like to put logger statements in code and build specific
>> module(s) as required for tracing latencies -- would you please suggest any
>> classes which can provide me these details?
>>
>> Regards,
>> Shrikant Bang.
>>
>> On Tue, Oct 23, 2018 at 3:09 PM Ma Gang  wrote:
>>
>>> You may post your query related log here, there should be some query log
>>> that indicated whether the filter is push down or not, at least in the
>>> returned response stats, there's some log show how many rows are filtered
>>> in the coprocessor side.
>>>
>>> At 2018-10-23 16:58:36, "Sachin Aggarwal"  
>>> wrote:
>>> >Hi Team,
>>> >
>>> > Could you please confirm if filtering of rows (WHERE clause) is done
>>> >in CoProcessor side?
>>> >
>>> > Is there any APIs/logging to get physical plan of query? It will help
>>> >us in optimising the cube.
>>> >
>>> >
>>> >
>>> >On Mon, Oct 22, 2018 at 8:58 PM Shrikant Bang 
>>> >wrote:
>>> >
>>> >> Thanks ShaoFeng for response. I will try this and will update the results
>>> >> of my queries.
>>> >>
>>> >> I would like to learn, how to identify the bottleneck in query 
>>> >> executions.
>>> >> Can we trace the query execution in each stage with timestamp?
>>> >>
>>> >> Also is there way we can get physical plan of query? This could help me 
>>> >> to
>>> >> design/tune my cube/queries for better response time.
>>> >>
>>> >> Regards,
>>> >> Shrikant Bang
>>> >>
>>> >> On Mon, Oct 22, 2018 at 8:01 PM ShaoFeng Shi 
>>> >> wrote:
>>> >>
>>> >>> Hi Shrikant,
>>> >>>
>>> >>> What's the order of the dimensions in the rowkey? In this case, you need
>>> >>> to put "d3" at the heading position of the rowkey.
>>> >>>
>>> >>> Here is a good reference on how to design a cube, maybe we need add that
>>> >>> into the FAQ or as a part of document:
>>> >>> https://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>>> >>>
>>> >>> Shrikant Bang  于2018年10月22日周一 下午3:51写道:
>>> >>>
>>>  Hi Team,
>>> 
>>>  We are working on benchmark test for Kylin v2.5-Hbase-1.x as part of 
>>>  PoC.
>>> 
>>>  Here is my cube (pseudo) :
>>> 
>>>  *Dimension Table* : D1
>>>  *Fact Table* : F1, F2
>>> 
>>>  *Metrics* : SUM(D1.m1), SUM(D2.m2)
>>>  *Dimension Columns* -- Normal (D1.d1, D1.d2, D1.d3, F1.a1, F2.b1 )
>>> 
>>>  JOIN (D1.d1 = F1.a1 AND D2.d2 = F2.b1)
>>> 
>>>  When I run a query matching to the cuboids it runs very fast :
>>>  pseudo example query:
>>> 
>>>  SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>>>  FROM D1
>>>  JOIN F1
>>>  ON D1.d1 = F1.a1
>>>  JOIN F2
>>>  ON D1.d2 = F2.b1
>>>  GROUP BY d1, d2, d3
>>> 
>>> 
>>>  But when I add where clause to query it become very slow in response
>>>  pseudo example query:
>>> 
>>>  SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>>>  FROM D1
>>>  JOIN F1
>>>  ON D1.d1 = F1.a1

Re: Re: Slow Query Performance With 'WHERE' Clause

2018-10-23 Thread Shrikant Bang
Hi Team,

  I tried debugging code to understand query execution (on Kylin v2.5 hbase
1.x) . As I understood query is formed and executed using Calcite APIs.

  Could you please help me to get answer of below?

  1.  With my understanding default implementation of filter operation
(WHERE clause) is running on Kylin’s node. Is it correct?

  2.  How to make use of CoprocessorFilter

for running filters on HBase side (if it is not enabled by default)?


Thank You,
Shrikant Bang

On Tue, Oct 23, 2018 at 4:33 PM Shrikant Bang 
wrote:

> Thank you Ma for response.
>
> I didn't find any logs about applying filters in co-processors. Also I
> don't see logs for logical/physical query plans in kylin.log. Am I missing
> anything?
>
> I would like to put logger statements in code and build specific module(s)
> as required for tracing latencies -- would you please suggest any classes
> which can provide me these details?
>
> Regards,
> Shrikant Bang.
>
> On Tue, Oct 23, 2018 at 3:09 PM Ma Gang  wrote:
>
>> You may post your query related log here, there should be some query log
>> that indicated whether the filter is push down or not, at least in the
>> returned response stats, there's some log show how many rows are filtered
>> in the coprocessor side.
>>
>> At 2018-10-23 16:58:36, "Sachin Aggarwal"  wrote:
>> >Hi Team,
>> >
>> > Could you please confirm if filtering of rows (WHERE clause) is done
>> >in CoProcessor side?
>> >
>> > Is there any APIs/logging to get physical plan of query? It will help
>> >us in optimising the cube.
>> >
>> >
>> >
>> >On Mon, Oct 22, 2018 at 8:58 PM Shrikant Bang 
>> >wrote:
>> >
>> >> Thanks ShaoFeng for response. I will try this and will update the results
>> >> of my queries.
>> >>
>> >> I would like to learn, how to identify the bottleneck in query executions.
>> >> Can we trace the query execution in each stage with timestamp?
>> >>
>> >> Also is there way we can get physical plan of query? This could help me to
>> >> design/tune my cube/queries for better response time.
>> >>
>> >> Regards,
>> >> Shrikant Bang
>> >>
>> >> On Mon, Oct 22, 2018 at 8:01 PM ShaoFeng Shi 
>> >> wrote:
>> >>
>> >>> Hi Shrikant,
>> >>>
>> >>> What's the order of the dimensions in the rowkey? In this case, you need
>> >>> to put "d3" at the heading position of the rowkey.
>> >>>
>> >>> Here is a good reference on how to design a cube, maybe we need add that
>> >>> into the FAQ or as a part of document:
>> >>> https://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>> >>>
>> >>> Shrikant Bang  于2018年10月22日周一 下午3:51写道:
>> >>>
>>  Hi Team,
>> 
>>  We are working on benchmark test for Kylin v2.5-Hbase-1.x as part of 
>>  PoC.
>> 
>>  Here is my cube (pseudo) :
>> 
>>  *Dimension Table* : D1
>>  *Fact Table* : F1, F2
>> 
>>  *Metrics* : SUM(D1.m1), SUM(D2.m2)
>>  *Dimension Columns* -- Normal (D1.d1, D1.d2, D1.d3, F1.a1, F2.b1 )
>> 
>>  JOIN (D1.d1 = F1.a1 AND D2.d2 = F2.b1)
>> 
>>  When I run a query matching to the cuboids it runs very fast :
>>  pseudo example query:
>> 
>>  SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>>  FROM D1
>>  JOIN F1
>>  ON D1.d1 = F1.a1
>>  JOIN F2
>>  ON D1.d2 = F2.b1
>>  GROUP BY d1, d2, d3
>> 
>> 
>>  But when I add where clause to query it become very slow in response
>>  pseudo example query:
>> 
>>  SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>>  FROM D1
>>  JOIN F1
>>  ON D1.d1 = F1.a1
>>  JOIN F2
>>  ON D1.d2 = F2.b1
>>  *WHERE d3 > 100 AND d3 < 1000*
>>  GROUP BY d1, d2, d3
>> 
>>  *In my case d3 is High Cardinality dimension which is part of row key (
>>  Normal Dimension ).*
>> 
>>  Here are question:
>> 
>>  1. I have installed Kylin Co-Processor
>>   
>>  before
>>  running queries. Do Kylin query results gets filtered Co-Processor end?
>> 
>>  2. How to find query traces to identify the bottleneck in response time?
>> 
>>  3. Even though I have enabled Query Cache, it seems its not getting used
>>  when query runs ( in case of multiple times also) .
>> 
>>  4. Any best practises to tune the queries with WHERE clause?
>> 
>> 
>>  Thank You,
>>  Shrikant Bang.
>> 
>> 
>> >>>
>> >>>
>> >>> --
>> >>> Best regards,
>> >>>
>> >>> Shaofeng Shi 史少锋
>> >>>
>> >>>
>> >
>> >--
>> >
>> >Thanks & Regards
>> >
>> >Sachin Aggarwal
>>
>>
>>
>>
>>
>


Re: Re: Slow Query Performance With 'WHERE' Clause

2018-10-23 Thread Shrikant Bang
Thank you Ma for response.

I didn't find any logs about applying filters in co-processors. Also I
don't see logs for logical/physical query plans in kylin.log. Am I missing
anything?

I would like to put logger statements in code and build specific module(s)
as required for tracing latencies -- would you please suggest any classes
which can provide me these details?

Regards,
Shrikant Bang.

On Tue, Oct 23, 2018 at 3:09 PM Ma Gang  wrote:

> You may post your query related log here, there should be some query log
> that indicated whether the filter is push down or not, at least in the
> returned response stats, there's some log show how many rows are filtered
> in the coprocessor side.
>
> At 2018-10-23 16:58:36, "Sachin Aggarwal"  wrote:
> >Hi Team,
> >
> > Could you please confirm if filtering of rows (WHERE clause) is done
> >in CoProcessor side?
> >
> > Is there any APIs/logging to get physical plan of query? It will help
> >us in optimising the cube.
> >
> >
> >
> >On Mon, Oct 22, 2018 at 8:58 PM Shrikant Bang 
> >wrote:
> >
> >> Thanks ShaoFeng for response. I will try this and will update the results
> >> of my queries.
> >>
> >> I would like to learn, how to identify the bottleneck in query executions.
> >> Can we trace the query execution in each stage with timestamp?
> >>
> >> Also is there way we can get physical plan of query? This could help me to
> >> design/tune my cube/queries for better response time.
> >>
> >> Regards,
> >> Shrikant Bang
> >>
> >> On Mon, Oct 22, 2018 at 8:01 PM ShaoFeng Shi 
> >> wrote:
> >>
> >>> Hi Shrikant,
> >>>
> >>> What's the order of the dimensions in the rowkey? In this case, you need
> >>> to put "d3" at the heading position of the rowkey.
> >>>
> >>> Here is a good reference on how to design a cube, maybe we need add that
> >>> into the FAQ or as a part of document:
> >>> https://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> >>>
> >>> Shrikant Bang  于2018年10月22日周一 下午3:51写道:
> >>>
>  Hi Team,
> 
>  We are working on benchmark test for Kylin v2.5-Hbase-1.x as part of PoC.
> 
>  Here is my cube (pseudo) :
> 
>  *Dimension Table* : D1
>  *Fact Table* : F1, F2
> 
>  *Metrics* : SUM(D1.m1), SUM(D2.m2)
>  *Dimension Columns* -- Normal (D1.d1, D1.d2, D1.d3, F1.a1, F2.b1 )
> 
>  JOIN (D1.d1 = F1.a1 AND D2.d2 = F2.b1)
> 
>  When I run a query matching to the cuboids it runs very fast :
>  pseudo example query:
> 
>  SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>  FROM D1
>  JOIN F1
>  ON D1.d1 = F1.a1
>  JOIN F2
>  ON D1.d2 = F2.b1
>  GROUP BY d1, d2, d3
> 
> 
>  But when I add where clause to query it become very slow in response
>  pseudo example query:
> 
>  SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>  FROM D1
>  JOIN F1
>  ON D1.d1 = F1.a1
>  JOIN F2
>  ON D1.d2 = F2.b1
>  *WHERE d3 > 100 AND d3 < 1000*
>  GROUP BY d1, d2, d3
> 
>  *In my case d3 is High Cardinality dimension which is part of row key (
>  Normal Dimension ).*
> 
>  Here are question:
> 
>  1. I have installed Kylin Co-Processor
>   
>  before
>  running queries. Do Kylin query results gets filtered Co-Processor end?
> 
>  2. How to find query traces to identify the bottleneck in response time?
> 
>  3. Even though I have enabled Query Cache, it seems its not getting used
>  when query runs ( in case of multiple times also) .
> 
>  4. Any best practises to tune the queries with WHERE clause?
> 
> 
>  Thank You,
>  Shrikant Bang.
> 
> 
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>> Shaofeng Shi 史少锋
> >>>
> >>>
> >
> >--
> >
> >Thanks & Regards
> >
> >Sachin Aggarwal
>
>
>
>
>