Hi ShaoFeng, Great, thank you for the link! Best regards, Andras
> On 2019. Dec 7., at 11:52, ShaoFeng Shi <shaofeng...@apache.org> wrote: > > This is what we called "read-write separated" deployment. Actually many Kylin > users are running in this mode: deploy a dedicated HBase cluster only for > query, and with another Hadoop cluster for cube building. You can refer to > this blog: > > https://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/ > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Email: shaofeng...@apache.org > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: user-subscr...@kylin.apache.org > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > Andras Nagy <andras.istvan.n...@gmail.com> 于2019年12月7日周六 上午12:42写道: >> Hi All, >> >> Thanks a lot, guys! With your generous help Xiaoxiang we have been able to >> track down the problem, it was mostly related to too large segments with not >> enough regions, so low level of parallelism in the HBase side. >> Now the p95 of query latency is acceptable for the time being, although we >> will still look into improving it. >> >> However, there is still a small number of queries that take excessively long >> to execute. Upon analyzing these, it turned out that there is a regular >> pattern to when these occur, and they coincide with the cubing job >> executions on the same cluster (hourly stream cube building, daily cube >> refresh to handle very late events). >> >> As a consequence, I'd like to move all of these yarn-based workloads (Hive, >> MapReduce) to a separate EMR cluster (our environment is in AWS) and keep >> another EMR only for HBase. This would also help with better resource usage >> of these clusters, as we wouldn't need a static split of node memory between >> HBase and Yarn. >> >> Does anyone have experience with this setup? Do you see any potential issues >> with it? Any feedback on this is welcome. >> >> Thanks a lot, >> Andras >> >> >>> On Mon, Dec 2, 2019 at 3:56 AM Xiaoxiang Yu <xiaoxiang...@kyligence.io> >>> wrote: >>> Hi andras, >>> >>> I would like to share what I find. >>> >>> There are several step in query execuation, you can find some log which can >>> indicate the start an end of these steps, that will help to you find which >>> step cost a lot time. >>> >>> >>> >>> Step 1. Check SQL ACL and grammar; and then use Calicate to parse SQL into >>> AST and finally into an execution plan. >>> >>> >>> >>> Start of step 1: >>> >>> service.QueryService:414 : The original query: >>> >>> End of step 1: >>> >>> enumerator.OLAPEnumerator:105 : query storage... >>> >>> Step 2. Send RPC request to region server or streaming receiver. >>> >>> >>> >>> Start of send RPC request to Region Server. >>> >>> 2019-11-28 20:01:53,273 INFO [Query >>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseEndpointRPC:165 : The >>> scan 30572c87 for segment UAC[20191127151000_20191128012000] is as below >>> with 1 separate raw scans, shard part of start/end key is set to 0 >>> >>> 2019-11-28 20:01:53,275 INFO [Query >>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] v2.CubeHBaseRPC:288 : Visiting >>> hbase table lacus:LACUS_AGEAMBMTM1: cuboid require post aggregation, from >>> 89 to 127 Start: >>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00 >>> >>> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00n\x07\x0A\xFA\xAA\x00\x00\x00\x00\x00\x00\x00\x00\x00) >>> Stop: >>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFF\x6E\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00 >>> >>> (\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\xFF\xFF\xFF\xFFn\x07\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00) >>> Fuzzy key counts: 1. Fuzzy keys : >>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x7F\x00\x00\x00\x00\x6E\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >>> >>> \x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01; >>> >>> >>> >>> End of send RPC request to Region Server. >>> >>> 2019-11-28 20:02:10,751 INFO [kylin-coproc--pool2-t4] >>> v2.CubeHBaseEndpointRPC:343 : <sub-thread for Query >>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a GTScanRequest 30572c87>Endpoint RPC >>> returned from HTable lacus:LACUS_AGEAMBMTM1 Shard >>> \x6C\x61\x63\x75\x73\x3A\x4C\x41\x43\x55\x53\x5F\x41\x47\x45\x41\x4D\x42\x4D\x54\x4D\x31\x2C\x2C\x31\x35\x37\x34\x39\x30\x36\x31\x35\x35\x35\x39\x33\x2E\x39\x64\x39\x30\x39\x36\x31\x64\x66\x61\x38\x37\x31\x37\x35\x38\x36\x37\x33\x37\x65\x33\x36\x30\x36\x31\x35\x34\x34\x36\x36\x33\x2E >>> on host: cdh-worker-2.Total scanned row: 770235. Total scanned bytes: >>> 47587866. Total filtered row: 0. Total aggred row: 764215. Time elapsed in >>> EP: 17113(ms). Server CPU usage: 0.24357142857142858, server physical mem >>> left: 4.23337984E9, server swap mem left:0.0.Etc message: start latency: >>> 397@2,agg done@17112,compress done@17113,server stats done@17113, >>> >>> >>> >>> You will find statistics information for each hbase RPC, please check >>> which request take a lot of time or scan too much row. You may check the >>> following aspects: >>> >>> 1. Did your sql query hit the right/best match cuboid? >>> >>> 2. Could you try to change the order of Rowkey according to your SQL >>> query to reduce the scan row? >>> >>> 3. Was some region too large so you may try to cut some segment into >>> more small ones(improve the degree of parallelism)? >>> >>> >>> >>> Step 3. Do query server side aggregation and decode. >>> >>> >>> >>> From the last rpc statistics log to final query statistics info . >>> >>> 2019-11-28 20:02:10,966 INFO [Query >>> 77b0da35-a6e8-d6f9-cd50-152b4d7b0c5a-62] service.QueryService:491 : Stats >>> of SQL response: isException: false, duration: 21170, total scan count >>> 1565099 >>> >>> >>> >>> ---------------- >>> >>> Best wishes, >>> >>> Xiaoxiang Yu >>> >>> >>> >>> >>> >>> 发件人: Andras Nagy <andras.istvan.n...@gmail.com> >>> 答复: "user@kylin.apache.org" <user@kylin.apache.org> >>> 日期: 2019年11月28日 星期四 21:06 >>> 收件人: "user@kylin.apache.org" <user@kylin.apache.org> >>> 主题: Need advice and best practices for Kylin query tuning (not cube tuning) >>> >>> >>> >>> Dear All, >>> >>> >>> >>> We are troubleshooting slow queries in our Kylin deployment, and we suspect >>> that the issue is not with the cube definitions, but with our queries. At >>> least we have some quite complex queries with a lot of range checks on >>> dimension values, and we have observed different response times by changing >>> the queries to alternative, but functionally equivalent ones. >>> >>> >>> >>> Although it's hard to come to conclusions because we see a large variance >>> in query response times (for the same query in the same environment, at >>> roughly the same time). >>> >>> We have disabled query caching in kylin.properties >>> (kylin.query.cache-enabled=false) to be able to have more conclusive >>> results on what effect certain changes have on query execution time, but we >>> still observe variance in query results on an environment that otherwise >>> has no load. Perhaps this is due to caching within HBase or within the >>> streaming receiver. >>> >>> >>> >>> Do you have any guidelines, best practices, documentation on how to tune >>> queries for Kylin? (I'm aware of some cube tuning guidelines from the Kylin >>> documentation, but now I'm looking for advice specifically about query >>> optimization.) >>> >>> >>> >>> Many thanks, >>> >>> Andras