Re: Need advice and best practices for Kylin query tuning (not cube tuning)

ShaoFeng Shi Sat, 07 Dec 2019 02:52:49 -0800

This is what we called "read-write separated" deployment. Actually many
Kylin users are running in this mode: deploy a dedicated HBase cluster only
for query, and with another Hadoop cluster for cube building. You can refer
to this blog:


https://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]




Andras Nagy <[email protected]> 于2019年12月7日周六 上午12:42写道：

> Hi All,
>
> Thanks a lot, guys! With your generous help Xiaoxiang we have been able to
> track down the problem, it was mostly related to too large segments with
> not enough regions, so low level of parallelism in the HBase side.
> Now the p95 of query latency is acceptable for the time being, although we
> will still look into improving it.
>
> However, there is still a small number of queries that take excessively
> long to execute. Upon analyzing these, it turned out that there is a
> regular pattern to when these occur, and they coincide with the cubing job
> executions on the same cluster (hourly stream cube building, daily cube
> refresh to handle very late events).
>
> As a consequence, I'd like to move all of these yarn-based workloads
> (Hive, MapReduce) to a separate EMR cluster (our environment is in AWS) and
> keep another EMR only for HBase. This would also help with better resource
> usage of these clusters, as we wouldn't need a static split of node memory
> between HBase and Yarn.
>
> Does anyone have experience with this setup? Do you see any potential
> issues with it? Any feedback on this is welcome.
>
> Thanks a lot,
> Andras
>
>
> On Mon, Dec 2, 2019 at 3:56 AM Xiaoxiang Yu <[email protected]>
> wrote:
>
>> Hi andras,
>>
>> I would like to share what I find.
>>
>> There are several step in query execuation, you can find some log which
>> can indicate the start an end of these steps, that will help to you find
>> which step cost a lot time.
>>
>>
>>
>> *Step 1. Check SQL ACL and grammar; and then use Calicate to parse SQL
>> into AST and finally into an execution plan.*
>>
>>
>>
>>    *Start of step 1:*
>>
>> service.QueryService:414 : The original query:
>>
>>        End of step 1:
>>
>>          enumerator.OLAPEnumerator:105 : query storage...
>>
>> *Step 2. Send RPC request to region server or streaming receiver.*
>>
>>
>>
>>    *Start of send RPC request to Region Server.*
>>
>>           2019-11-28 20:01:53,273 INFO  [Query
>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseEndpointRPC:165 : The
>> scan *30572c87* for segment UAC[20191127151000_20191128012000] is as
>> below with 1 separate raw scans, shard part of start/end key is set to 0
>>
>> 2019-11-28 20:01:53,275 INFO  [Query
>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseRPC:288 : Visiting
>> hbase table *lacus:LACUS_AGEAMBMTM1*: cuboid require post aggregation,
>> from 89 to 127 Start:
>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00
>> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00n\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00)
>> Stop:
>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFF\x6E\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00
>> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFFn\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00)
>> Fuzzy key counts: 1. Fuzzy keys :
>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>> \x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01;
>>
>>
>>
>>        *End of send RPC request to Region Server.*
>>
>> 2019-11-28 20:02:10,751 INFO  [kylin-coproc--pool2-t4]
>> v2.CubeHBaseEndpointRPC:343 : <sub-thread for Query
>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a GTScanRequest *30572c87*>Endpoint
>> RPC returned from HTable lacus:LACUS_AGEAMBMTM1 Shard
>> \x6C\x61\x63\x75\x73\x3A\x4C\x41\x43\x55\x53\x5F\x41\x47\x45\x41\x4D\x42\x4D\x54\x4D\x31\x2C\x2C\x31\x35\x37\x34\x39\x30\x36\x31\x35\x35\x35\x39\x33\x2E\x39\x64\x39\x30\x39\x36\x31\x64\x66\x61\x38\x37\x31\x37\x35\x38\x36\x37\x33\x37\x65\x33\x36\x30\x36\x31\x35\x34\x34\x36\x36\x33\x2E
>> on host: cdh-worker-2.*Total scanned row: 770235*. Total scanned bytes:
>> 47587866. Total filtered row: 0. Total aggred row: 764215. *Time elapsed
>> in EP: 17113(ms)*. Server CPU usage: 0.24357142857142858, server
>> physical mem left: 4.23337984E9, server swap mem left:0.0.Etc message:
>> start latency: 397@2,agg done@17112,compress done@17113,server stats
>> done@17113,
>>
>>
>>
>>       You will find statistics information for each hbase RPC, please
>> check which request take a lot of time or scan too much row. You may check
>> the following aspects:
>>
>> 1.     Did your sql query hit the right/best match cuboid?
>>
>> 2.     Could you try to change the order of Rowkey according to your SQL
>> query to reduce the scan row?
>>
>> 3.     Was some region too large so you may try to cut some segment into
>> more small ones(improve the degree of parallelism)?
>>
>>
>>
>> *Step 3. Do query server side aggregation and decode.*
>>
>>
>>
>>       From the last rpc statistics log to final query statistics info .
>>
>>        2019-11-28 20:02:10,966 INFO  [Query
>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] service.QueryService:491 : Stats
>> of SQL response: isException: false, duration: 21170, total scan count
>> 1565099
>>
>>
>>
>> ----------------
>>
>> Best wishes,
>>
>> Xiaoxiang Yu
>>
>>
>>
>>
>>
>> *发件人**: *Andras Nagy <[email protected]>
>> *答复**: *"[email protected]" <[email protected]>
>> *日期**: *2019年11月28日 星期四 21:06
>> *收件人**: *"[email protected]" <[email protected]>
>> *主题**: *Need advice and best practices for Kylin query tuning (not cube
>> tuning)
>>
>>
>>
>> Dear All,
>>
>>
>>
>> We are troubleshooting slow queries in our Kylin deployment, and we
>> suspect that the issue is not with the cube definitions, but with our
>> queries. At least we have some quite complex queries with a lot of range
>> checks on dimension values, and we have observed different response times
>> by changing the queries to alternative, but functionally equivalent ones.
>>
>>
>>
>> Although it's hard to come to conclusions because we see a large variance
>> in query response times (for the same query in the same environment, at
>> roughly the same time).
>>
>> We have disabled query caching in kylin.properties
>> (kylin.query.cache-enabled=false) to be able to have more conclusive
>> results on what effect certain changes have on query execution time, but we
>> still observe variance in query results on an environment that otherwise
>> has no load. Perhaps this is due to caching within HBase or within the
>> streaming receiver.
>>
>>
>>
>> Do you have any guidelines, best practices, documentation on how to tune
>> queries for Kylin? (I'm aware of some cube tuning guidelines from the Kylin
>> documentation, but now I'm looking for advice specifically about query
>> optimization.)
>>
>>
>>
>> Many thanks,
>>
>> Andras
>>
>

Re: Need advice and best practices for Kylin query tuning (not cube tuning)

Reply via email to