Hi Team,

     Could you please confirm if filtering of rows (WHERE clause) is done
in CoProcessor side?

     Is there any APIs/logging to get physical plan of query? It will help
us in optimising the cube.



On Mon, Oct 22, 2018 at 8:58 PM Shrikant Bang <b.shrikan...@gmail.com>
wrote:

> Thanks ShaoFeng for response. I will try this and will update the results
> of my queries.
>
> I would like to learn, how to identify the bottleneck in query executions.
> Can we trace the query execution in each stage with timestamp?
>
> Also is there way we can get physical plan of query? This could help me to
> design/tune my cube/queries for better response time.
>
> Regards,
> Shrikant Bang
>
> On Mon, Oct 22, 2018 at 8:01 PM ShaoFeng Shi <shaofeng...@apache.org>
> wrote:
>
>> Hi Shrikant,
>>
>> What's the order of the dimensions in the rowkey? In this case, you need
>> to put "d3" at the heading position of the rowkey.
>>
>> Here is a good reference on how to design a cube, maybe we need add that
>> into the FAQ or as a part of document:
>> https://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>>
>> Shrikant Bang <b.shrikan...@gmail.com> 于2018年10月22日周一 下午3:51写道:
>>
>>> Hi Team,
>>>
>>> We are working on benchmark test for Kylin v2.5-Hbase-1.x as part of PoC.
>>>
>>> Here is my cube (pseudo) :
>>>
>>> *Dimension Table* : D1
>>> *Fact Table* : F1, F2
>>>
>>> *Metrics* : SUM(D1.m1), SUM(D2.m2)
>>> *Dimension Columns* -- Normal (D1.d1, D1.d2, D1.d3, F1.a1, F2.b1 )
>>>
>>> JOIN (D1.d1 = F1.a1 AND D2.d2 = F2.b1)
>>>
>>> When I run a query matching to the cuboids it runs very fast :
>>> pseudo example query:
>>>
>>> SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>>> FROM D1
>>> JOIN F1
>>> ON D1.d1 = F1.a1
>>> JOIN F2
>>> ON D1.d2 = F2.b1
>>> GROUP BY d1, d2, d3
>>>
>>>
>>> But when I add where clause to query it become very slow in response
>>> pseudo example query:
>>>
>>> SELECT SUM(D1.m1), SUM(D2.m2), d1, d2, d3
>>> FROM D1
>>> JOIN F1
>>> ON D1.d1 = F1.a1
>>> JOIN F2
>>> ON D1.d2 = F2.b1
>>> *WHERE d3 > 100 AND d3 < 1000*
>>> GROUP BY d1, d2, d3
>>>
>>> *In my case d3 is High Cardinality dimension which is part of row key (
>>> Normal Dimension ).*
>>>
>>> Here are question:
>>>
>>> 1. I have installed Kylin Co-Processor
>>> <http://kylin.apache.org/docs20/howto/howto_update_coprocessor.html> before
>>> running queries. Do Kylin query results gets filtered Co-Processor end?
>>>
>>> 2. How to find query traces to identify the bottleneck in response time?
>>>
>>> 3. Even though I have enabled Query Cache, it seems its not getting used
>>> when query runs ( in case of multiple times also) .
>>>
>>> 4. Any best practises to tune the queries with WHERE clause?
>>>
>>>
>>> Thank You,
>>> Shrikant Bang.
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>

-- 

Thanks & Regards

Sachin Aggarwal

Reply via email to