One possible place I can find in the code is using *GTStreamAggregateScanne*r (in "*SegmentCubeTupleIterator.java#111"*). You can find it does do aggregate in *"GTStreamAggregateScanner.AbstractStreamMergeIterator#next*" so it'll reduce the inputs. But there's no log printing in this class as you can see, so it's pretty hard to confirm. Try "kylin.query.stream-aggregate-enabled=false" and run the scenario again to see any differences.
cheney <[email protected]> 于2018年11月5日周一 下午6:55写道: > Yes. the log is as following. > > 2018-11-02 22:25:34,980 DEBUG [Query > 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] > gtrecord.StorageResponseGTScatter:88 : Using > SortMergedPartitionResultIterator to merge 103 partition results > 2018-11-02 22:25:34,982 INFO [Query > 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] > gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat *to > merge segment results* > 2018-11-02 22:25:34,982 DEBUG [Query > 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122 > : return TupleIterator... > 2018-11-02 22:25:34,991 INFO [Query > 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : > *Processed > rows for each storageContext*: 366 > 2018-11-02 22:25:34,991 INFO [Query > 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 : > Stats of SQL response: isException: false, duration: 20, *total scan > count 1552* > > Acoording the log, *valueA *= 366. *valueB*= (total scan count) 1552 - > (total Agrrated/filterd in hbase)270 = 1282 > *valueB *is much larger than *valueA *. > > > > ------------------ 原始邮件 ------------------ > *发件人:* "JiaTao Tao"<[email protected]>; > *发送时间:* 2018年11月5日(星期一) 下午2:41 > *收件人:* "user"<[email protected]>; > *主题:* Re: doubt about measure of processedRowCount > > Can you grep logs like "to merge segment results" in that scenario? > > cheney <[email protected]> 于2018年11月3日周六 下午4:15写道: > >> Thank your repling, .but I am sure there's only one OlapContext in the >> quey in my scenario. >> ---Original--- >> *From:* "JiaTao Tao"<[email protected]> >> *Date:* Sat, Nov 3, 2018 10:42 AM >> *To:* "user"<[email protected]>; >> *Subject:* Re: doubt about measure of processedRowCount >> >> Maybe count all the *valueA *would be more appropriate, cuz maybe >> there's more than one OlapContext in the query ( one OlapContext correspond >> one storageContext ). >> >> There are two good blogs about Kylin's query engine, you may take a look >> :). >> >> https://blog.csdn.net/yu616568/article/details/50838504 >> >> https://zhuanlan.zhihu.com/p/30613434 >> >> cheney <[email protected]> 于2018年11月2日周五 下午11:10写道: >> >>> Hi, guys >>> >>> When I executed a sql in kylin, kylin server will log some log >>> about query statics. for example, The log is as following: >>> >>> "Processed rows for each storageContext: *valueA*". *valueA *is >>> processedRowCount. >>> >>> What I understand is processedRowCount is the record rows numbers >>> returned by hbase. >>> >>> Hbase corprocessor will log region stats, including: "*Total >>> scanned row*","Total filtered/aggred row". >>> >>> For one region, final records returned by hbase = *Total scanned >>> row - *Total filtered/aggred row; >>> Suppose this query need to scan 10 region in hbase, we can get >>> every region stats. we can get all records *valueB *returned by hbase >>> by >>> suming every final records in 10 region. >>> >>> In general, *valueA *is equal to * valueB*, but *valueB *is much >>> larger than *valueA* in sometimes. Why? >>> >>> >>> >> >> >> -- >> >> >> Regards! >> >> Aron Tao >> > > > -- > > > Regards! > > Aron Tao > -- Regards! Aron Tao
