Re: Need advice and best practices for Kylin query tuning (not cube tuning)

Xiaoxiang Yu Sun, 01 Dec 2019 18:57:29 -0800

Hi andras,
I would like to share what I find.
There are several step in query execuation, you can find some log which can 
indicate the start an end of these steps, that will help to you find which step 
cost a lot time.

Step 1. Check SQL ACL and grammar; and then use Calicate to parse SQL into AST
and finally into an execution plan.

Start of step 1:
service.QueryService:414 : The original query:
End of step 1:
enumerator.OLAPEnumerator:105 : query storage...
Step 2. Send RPC request to region server or streaming receiver.

Start of send RPC request to Region Server.
2019-11-28 20:01:53,273 INFO [Query
77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseEndpointRPC:165 : The scan
30572c87 for segment UAC[20191127151000_20191128012000] is as below with 1
separate raw scans, shard part of start/end key is set to 0
2019-11-28 20:01:53,275 INFO [Query 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62]
v2.CubeHBaseRPC:288 : Visiting hbase table lacus:LACUS_AGEAMBMTM1: cuboid
require post aggregation, from 89 to 127 Start:
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00

(\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00n\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00)
Stop:
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFF\x6E\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00

(\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFFn\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00)
Fuzzy key counts: 1. Fuzzy keys :
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01;

End of send RPC request to Region Server.
2019-11-28 20:02:10,751 INFO [kylin-coproc--pool2-t4]
v2.CubeHBaseEndpointRPC:343 : <sub-thread for Query
77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a GTScanRequest 30572c87>Endpoint RPC
returned from HTable lacus:LACUS_AGEAMBMTM1 Shard
\x6C\x61\x63\x75\x73\x3A\x4C\x41\x43\x55\x53\x5F\x41\x47\x45\x41\x4D\x42\x4D\x54\x4D\x31\x2C\x2C\x31\x35\x37\x34\x39\x30\x36\x31\x35\x35\x35\x39\x33\x2E\x39\x64\x39\x30\x39\x36\x31\x64\x66\x61\x38\x37\x31\x37\x35\x38\x36\x37\x33\x37\x65\x33\x36\x30\x36\x31\x35\x34\x34\x36\x36\x33\x2E
on host: cdh-worker-2.Total scanned row: 770235. Total scanned bytes:
47587866. Total filtered row: 0. Total aggred row: 764215. Time elapsed in EP:
17113(ms). Server CPU usage: 0.24357142857142858, server physical mem left:
4.23337984E9, server swap mem left:0.0.Etc message: start latency: 397@2,agg
done@17112,compress done@17113,server stats done@17113,

You will find statistics information for each hbase RPC, please check
which request take a lot of time or scan too much row. You may check the
following aspects:

1. Did your sql query hit the right/best match cuboid?

2. Could you try to change the order of Rowkey according to your SQL query
to reduce the scan row?

3. Was some region too large so you may try to cut some segment into more
small ones(improve the degree of parallelism)?

Step 3. Do query server side aggregation and decode.

From the last rpc statistics log to final query statistics info .
2019-11-28 20:02:10,966 INFO [Query
77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] service.QueryService:491 : Stats of
SQL response: isException: false, duration: 21170, total scan count 1565099

----------------
Best wishes,
Xiaoxiang Yu

发件人: Andras Nagy <andras.istvan.n...@gmail.com>
答复: "user@kylin.apache.org" <user@kylin.apache.org>
日期: 2019年11月28日 星期四 21:06
收件人: "user@kylin.apache.org" <user@kylin.apache.org>
主题: Need advice and best practices for Kylin query tuning (not cube tuning)

Dear All,

We are troubleshooting slow queries in our Kylin deployment, and we suspect
that the issue is not with the cube definitions, but with our queries. At least
we have some quite complex queries with a lot of range checks on dimension
values, and we have observed different response times by changing the queries
to alternative, but functionally equivalent ones.

Although it's hard to come to conclusions because we see a large variance in
query response times (for the same query in the same environment, at roughly
the same time).
We have disabled query caching in kylin.properties
(kylin.query.cache-enabled=false) to be able to have more conclusive results on
what effect certain changes have on query execution time, but we still observe
variance in query results on an environment that otherwise has no load. Perhaps
this is due to caching within HBase or within the streaming receiver.

Do you have any guidelines, best practices, documentation on how to tune
queries for Kylin? (I'm aware of some cube tuning guidelines from the Kylin
documentation, but now I'm looking for advice specifically about query
optimization.)

Many thanks,
Andras

Re: Need advice and best practices for Kylin query tuning (not cube tuning)

Reply via email to