Thanks, Shaofeng, for your affirmation :). ShaoFeng Shi <shaofeng...@apache.org> 于2018年11月7日周三 上午9:29写道:
> Good job Jiatao! I appreciate your support to the community! > > JiaTao Tao <taojia...@gmail.com> 于2018年11月7日周三 上午9:17写道: > >> Very glad that my reply is helpful, I already opened a JIRA to add logs >> for "*GTStreamAggregateScanner*" and next time it would be much easier >> to navigate this :). >> >> cheney <531014...@qq.com> 于2018年11月6日周二 下午11:57写道: >> >>> Hi, JiaTao, thank you very much! The statis is right when I config >>> "kylin.query.stream-aggregate-enabled=false". >>> You are right. Records are pre-aggregated by GTStreamAggregateScanner. >>> >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "JiaTao Tao"<taojia...@gmail.com>; >>> *发送时间:* 2018年11月6日(星期二) 晚上10:50 >>> *收件人:* "user"<user@kylin.apache.org>; >>> *主题:* Re: doubt about measure of processedRowCount >>> >>> One possible place I can find in the code is using >>> *GTStreamAggregateScanne*r (in "*SegmentCubeTupleIterator.java#111"*). >>> You can find it does do aggregate in >>> *"GTStreamAggregateScanner.AbstractStreamMergeIterator#next*" so it'll >>> reduce the inputs. But there's no log printing in this class as you can >>> see, so it's pretty hard to confirm. Try >>> "kylin.query.stream-aggregate-enabled=false" and run the scenario again to >>> see any differences. >>> >>> cheney <531014...@qq.com> 于2018年11月5日周一 下午6:55写道: >>> >>>> Yes. the log is as following. >>>> >>>> 2018-11-02 22:25:34,980 DEBUG [Query >>>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] >>>> gtrecord.StorageResponseGTScatter:88 : Using >>>> SortMergedPartitionResultIterator to merge 103 partition results >>>> 2018-11-02 22:25:34,982 INFO [Query >>>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] >>>> gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat *to >>>> merge segment results* >>>> 2018-11-02 22:25:34,982 DEBUG [Query >>>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122 >>>> : return TupleIterator... >>>> 2018-11-02 22:25:34,991 INFO [Query >>>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : >>>> *Processed >>>> rows for each storageContext*: 366 >>>> 2018-11-02 22:25:34,991 INFO [Query >>>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 : >>>> Stats of SQL response: isException: false, duration: 20, *total scan >>>> count 1552* >>>> >>>> Acoording the log, *valueA *= 366. *valueB*= (total scan count) 1552 >>>> - (total Agrrated/filterd in hbase)270 = 1282 >>>> *valueB *is much larger than *valueA *. >>>> >>>> >>>> >>>> ------------------ 原始邮件 ------------------ >>>> *发件人:* "JiaTao Tao"<taojia...@gmail.com>; >>>> *发送时间:* 2018年11月5日(星期一) 下午2:41 >>>> *收件人:* "user"<user@kylin.apache.org>; >>>> *主题:* Re: doubt about measure of processedRowCount >>>> >>>> Can you grep logs like "to merge segment results" in that scenario? >>>> >>>> cheney <531014...@qq.com> 于2018年11月3日周六 下午4:15写道: >>>> >>>>> Thank your repling, .but I am sure there's only one OlapContext in >>>>> the quey in my scenario. >>>>> ---Original--- >>>>> *From:* "JiaTao Tao"<taojia...@gmail.com> >>>>> *Date:* Sat, Nov 3, 2018 10:42 AM >>>>> *To:* "user"<user@kylin.apache.org>; >>>>> *Subject:* Re: doubt about measure of processedRowCount >>>>> >>>>> Maybe count all the *valueA *would be more appropriate, cuz maybe >>>>> there's more than one OlapContext in the query ( one OlapContext >>>>> correspond >>>>> one storageContext ). >>>>> >>>>> There are two good blogs about Kylin's query engine, you may take a >>>>> look :). >>>>> >>>>> https://blog.csdn.net/yu616568/article/details/50838504 >>>>> >>>>> https://zhuanlan.zhihu.com/p/30613434 >>>>> >>>>> cheney <531014...@qq.com> 于2018年11月2日周五 下午11:10写道: >>>>> >>>>>> Hi, guys >>>>>> >>>>>> When I executed a sql in kylin, kylin server will log some >>>>>> log about query statics. for example, The log is as following: >>>>>> >>>>>> "Processed rows for each storageContext: *valueA*". *valueA *is >>>>>> processedRowCount. >>>>>> >>>>>> What I understand is processedRowCount is the record rows >>>>>> numbers returned by hbase. >>>>>> >>>>>> Hbase corprocessor will log region stats, including: "*Total >>>>>> scanned row*","Total filtered/aggred row". >>>>>> >>>>>> For one region, final records returned by hbase = *Total >>>>>> scanned >>>>>> row - *Total filtered/aggred row; >>>>>> Suppose this query need to scan 10 region in hbase, we can get >>>>>> every region stats. we can get all records *valueB *returned by >>>>>> hbase by >>>>>> suming every final records in 10 region. >>>>>> >>>>>> In general, *valueA *is equal to * valueB*, but *valueB *is >>>>>> much larger than *valueA* in sometimes. Why? >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> Regards! >>>>> >>>>> Aron Tao >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Regards! >>>> >>>> Aron Tao >>>> >>> >>> >>> -- >>> >>> >>> Regards! >>> >>> Aron Tao >>> >> >> >> -- >> >> >> Regards! >> >> Aron Tao >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > -- Regards! Aron Tao