This is what we called "read-write separated" deployment. Actually many Kylin users are running in this mode: deploy a dedicated HBase cluster only for query, and with another Hadoop cluster for cube building. You can refer to this blog:
https://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/ Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: [email protected] Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: [email protected] Join Kylin dev mail group: [email protected] Andras Nagy <[email protected]> 于2019年12月7日周六 上午12:42写道: > Hi All, > > Thanks a lot, guys! With your generous help Xiaoxiang we have been able to > track down the problem, it was mostly related to too large segments with > not enough regions, so low level of parallelism in the HBase side. > Now the p95 of query latency is acceptable for the time being, although we > will still look into improving it. > > However, there is still a small number of queries that take excessively > long to execute. Upon analyzing these, it turned out that there is a > regular pattern to when these occur, and they coincide with the cubing job > executions on the same cluster (hourly stream cube building, daily cube > refresh to handle very late events). > > As a consequence, I'd like to move all of these yarn-based workloads > (Hive, MapReduce) to a separate EMR cluster (our environment is in AWS) and > keep another EMR only for HBase. This would also help with better resource > usage of these clusters, as we wouldn't need a static split of node memory > between HBase and Yarn. > > Does anyone have experience with this setup? Do you see any potential > issues with it? Any feedback on this is welcome. > > Thanks a lot, > Andras > > > On Mon, Dec 2, 2019 at 3:56 AM Xiaoxiang Yu <[email protected]> > wrote: > >> Hi andras, >> >> I would like to share what I find. >> >> There are several step in query execuation, you can find some log which >> can indicate the start an end of these steps, that will help to you find >> which step cost a lot time. >> >> >> >> *Step 1. Check SQL ACL and grammar; and then use Calicate to parse SQL >> into AST and finally into an execution plan.* >> >> >> >> *Start of step 1:* >> >> service.QueryService:414 : The original query: >> >> End of step 1: >> >> enumerator.OLAPEnumerator:105 : query storage... >> >> *Step 2. Send RPC request to region server or streaming receiver.* >> >> >> >> *Start of send RPC request to Region Server.* >> >> 2019-11-28 20:01:53,273 INFO [Query >> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseEndpointRPC:165 : The >> scan *30572c87* for segment UAC[20191127151000_20191128012000] is as >> below with 1 separate raw scans, shard part of start/end key is set to 0 >> >> 2019-11-28 20:01:53,275 INFO [Query >> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseRPC:288 : Visiting >> hbase table *lacus:LACUS_AGEAMBMTM1*: cuboid require post aggregation, >> from 89 to 127 Start: >> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00n\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00) >> Stop: >> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFF\x6E\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00 >> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFFn\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00) >> Fuzzy key counts: 1. Fuzzy keys : >> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> \x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01; >> >> >> >> *End of send RPC request to Region Server.* >> >> 2019-11-28 20:02:10,751 INFO [kylin-coproc--pool2-t4] >> v2.CubeHBaseEndpointRPC:343 : <sub-thread for Query >> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a GTScanRequest *30572c87*>Endpoint >> RPC returned from HTable lacus:LACUS_AGEAMBMTM1 Shard >> \x6C\x61\x63\x75\x73\x3A\x4C\x41\x43\x55\x53\x5F\x41\x47\x45\x41\x4D\x42\x4D\x54\x4D\x31\x2C\x2C\x31\x35\x37\x34\x39\x30\x36\x31\x35\x35\x35\x39\x33\x2E\x39\x64\x39\x30\x39\x36\x31\x64\x66\x61\x38\x37\x31\x37\x35\x38\x36\x37\x33\x37\x65\x33\x36\x30\x36\x31\x35\x34\x34\x36\x36\x33\x2E >> on host: cdh-worker-2.*Total scanned row: 770235*. Total scanned bytes: >> 47587866. Total filtered row: 0. Total aggred row: 764215. *Time elapsed >> in EP: 17113(ms)*. Server CPU usage: 0.24357142857142858, server >> physical mem left: 4.23337984E9, server swap mem left:0.0.Etc message: >> start latency: 397@2,agg done@17112,compress done@17113,server stats >> done@17113, >> >> >> >> You will find statistics information for each hbase RPC, please >> check which request take a lot of time or scan too much row. You may check >> the following aspects: >> >> 1. Did your sql query hit the right/best match cuboid? >> >> 2. Could you try to change the order of Rowkey according to your SQL >> query to reduce the scan row? >> >> 3. Was some region too large so you may try to cut some segment into >> more small ones(improve the degree of parallelism)? >> >> >> >> *Step 3. Do query server side aggregation and decode.* >> >> >> >> From the last rpc statistics log to final query statistics info . >> >> 2019-11-28 20:02:10,966 INFO [Query >> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] service.QueryService:491 : Stats >> of SQL response: isException: false, duration: 21170, total scan count >> 1565099 >> >> >> >> ---------------- >> >> Best wishes, >> >> Xiaoxiang Yu >> >> >> >> >> >> *发件人**: *Andras Nagy <[email protected]> >> *答复**: *"[email protected]" <[email protected]> >> *日期**: *2019年11月28日 星期四 21:06 >> *收件人**: *"[email protected]" <[email protected]> >> *主题**: *Need advice and best practices for Kylin query tuning (not cube >> tuning) >> >> >> >> Dear All, >> >> >> >> We are troubleshooting slow queries in our Kylin deployment, and we >> suspect that the issue is not with the cube definitions, but with our >> queries. At least we have some quite complex queries with a lot of range >> checks on dimension values, and we have observed different response times >> by changing the queries to alternative, but functionally equivalent ones. >> >> >> >> Although it's hard to come to conclusions because we see a large variance >> in query response times (for the same query in the same environment, at >> roughly the same time). >> >> We have disabled query caching in kylin.properties >> (kylin.query.cache-enabled=false) to be able to have more conclusive >> results on what effect certain changes have on query execution time, but we >> still observe variance in query results on an environment that otherwise >> has no load. Perhaps this is due to caching within HBase or within the >> streaming receiver. >> >> >> >> Do you have any guidelines, best practices, documentation on how to tune >> queries for Kylin? (I'm aware of some cube tuning guidelines from the Kylin >> documentation, but now I'm looking for advice specifically about query >> optimization.) >> >> >> >> Many thanks, >> >> Andras >> >
