Hi, JiaTao, thank you very much! The statis is right when I config "kylin.query.stream-aggregate-enabled=false". You are right. Records are pre-aggregated by GTStreamAggregateScanner.
------------------ ???????? ------------------ ??????: "JiaTao Tao"<[email protected]>; ????????: 2018??11??6??(??????) ????10:50 ??????: "user"<[email protected]>; ????: Re: doubt about measure of processedRowCount One possible place I can find in the code is using GTStreamAggregateScanner (in "SegmentCubeTupleIterator.java#111"). You can find it does do aggregate in "GTStreamAggregateScanner.AbstractStreamMergeIterator#next" so it'll reduce the inputs. But there's no log printing in this class as you can see, so it's pretty hard to confirm. Try "kylin.query.stream-aggregate-enabled=false" and run the scenario again to see any differences. cheney <[email protected]> ??2018??11??5?????? ????6:55?????? Yes. the log is as following. 2018-11-02 22:25:34,980 DEBUG [Query 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] gtrecord.StorageResponseGTScatter:88 : Using SortMergedPartitionResultIterator to merge 103 partition results 2018-11-02 22:25:34,982 INFO [Query 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat to merge segment results 2018-11-02 22:25:34,982 DEBUG [Query 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122 : return TupleIterator... 2018-11-02 22:25:34,991 INFO [Query 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : Processed rows for each storageContext: 366 2018-11-02 22:25:34,991 INFO [Query 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 : Stats of SQL response: isException: false, duration: 20, total scan count 1552 Acoording the log, valueA = 366. valueB= (total scan count) 1552 - (total Agrrated/filterd in hbase)270 = 1282 valueB is much larger than valueA . ------------------ ???????? ------------------ ??????: "JiaTao Tao"<[email protected]>; ????????: 2018??11??5??(??????) ????2:41 ??????: "user"<[email protected]>; ????: Re: doubt about measure of processedRowCount Can you grep logs like "to merge segment results" in that scenario? cheney <[email protected]> ??2018??11??3?????? ????4:15?????? Thank your repling, .but I am sure there's only one OlapContext in the quey in my scenario. ---Original--- From: "JiaTao Tao"<[email protected]> Date: Sat, Nov 3, 2018 10:42 AM To: "user"<[email protected]>; Subject: Re: doubt about measure of processedRowCount Maybe count all the valueA would be more appropriate, cuz maybe there's more than one OlapContext in the query ( one OlapContext correspond one storageContext ). There are two good blogs about Kylin's query engine, you may take a look :). https://blog.csdn.net/yu616568/article/details/50838504 https://zhuanlan.zhihu.com/p/30613434 cheney <[email protected]> ??2018??11??2?????? ????11:10?????? Hi, guys When I executed a sql in kylin, kylin server will log some log about query statics. for example, The log is as following: "Processed rows for each storageContext: valueA". valueA is processedRowCount. What I understand is processedRowCount is the record rows numbers returned by hbase. Hbase corprocessor will log region stats, including: "Total scanned row","Total filtered/aggred row". For one region, final records returned by hbase = Total scanned row - Total filtered/aggred row; Suppose this query need to scan 10 region in hbase, we can get every region stats. we can get all records valueB returned by hbase by suming every final records in 10 region. In general, valueA is equal to valueB, but valueB is much larger than valueA in sometimes. Why? -- Regards! Aron Tao -- Regards! Aron Tao -- Regards! Aron Tao
