Re: doubt about measure of processedRowCount

JiaTao Tao Tue, 06 Nov 2018 06:51:12 -0800

One possible place I can find in the code is using *GTStreamAggregateScanne*r
(in "*SegmentCubeTupleIterator.java#111"*). You can find it does do
aggregate in *"GTStreamAggregateScanner.AbstractStreamMergeIterator#next*"
so it'll reduce the inputs. But there's no log printing in this class as
you can see, so it's pretty hard to confirm. Try
"kylin.query.stream-aggregate-enabled=false" and run the scenario again to
see any differences.


cheney <[email protected]> 于2018年11月5日周一 下午6:55写道：

> Yes. the log is as following.
>
> 2018-11-02 22:25:34,980 DEBUG [Query
> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914]
> gtrecord.StorageResponseGTScatter:88 : Using
> SortMergedPartitionResultIterator to merge 103 partition results
> 2018-11-02 22:25:34,982 INFO  [Query
> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914]
> gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat *to
> merge segment results*
> 2018-11-02 22:25:34,982 DEBUG [Query
> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122
> : return TupleIterator...
> 2018-11-02 22:25:34,991 INFO  [Query
> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : 
> *Processed
> rows for each storageContext*: 366
> 2018-11-02 22:25:34,991 INFO  [Query
> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 :
> Stats of SQL response: isException: false, duration: 20, *total scan
> count 1552*
>
> Acoording the log,  *valueA *= 366. *valueB*= (total scan count) 1552 -
> (total Agrrated/filterd in hbase)270 = 1282
>  *valueB *is much larger than *valueA *.
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "JiaTao Tao"<[email protected]>;
> *发送时间:* 2018年11月5日(星期一) 下午2:41
> *收件人:* "user"<[email protected]>;
> *主题:* Re: doubt about measure of processedRowCount
>
> Can you grep logs like "to merge segment results" in that scenario?
>
> cheney <[email protected]> 于2018年11月3日周六 下午4:15写道：
>
>> Thank your repling, .but I  am sure there's only one OlapContext in the
>> quey in my scenario.
>> ---Original---
>> *From:* "JiaTao Tao"<[email protected]>
>> *Date:* Sat, Nov 3, 2018 10:42 AM
>> *To:* "user"<[email protected]>;
>> *Subject:* Re: doubt about measure of processedRowCount
>>
>> Maybe count all the *valueA *would be more appropriate, cuz maybe
>> there's more than one OlapContext in the query ( one OlapContext correspond
>> one storageContext ).
>>
>> There are two good blogs about Kylin's query engine, you may take a look
>> :).
>>
>> https://blog.csdn.net/yu616568/article/details/50838504
>>
>> https://zhuanlan.zhihu.com/p/30613434
>>
>> cheney <[email protected]> 于2018年11月2日周五 下午11:10写道：
>>
>>> Hi, guys
>>>
>>>         When I executed a sql in kylin, kylin server will log some log
>>> about query statics. for example, The log is as following:
>>>
>>>        "Processed rows for each storageContext: *valueA*". *valueA *is 
>>> processedRowCount.
>>>
>>>        What I understand is processedRowCount is the record rows numbers
>>> returned by hbase.
>>>
>>>        Hbase corprocessor will log region stats, including:  "*Total
>>> scanned row*","Total filtered/aggred row".
>>>
>>>         For  one region,  final records returned by hbase = *Total scanned
>>> row - *Total filtered/aggred row;
>>>        Suppose this query need to scan 10 region in hbase, we can get
>>> every region stats. we can get all records  *valueB *returned by hbase
>>> by
>>>        suming every final records in 10 region.
>>>
>>>       In general, *valueA *is equal to * valueB*, but *valueB *is much
>>> larger than *valueA* in sometimes. Why?
>>>
>>>
>>>
>>
>>
>> --
>>
>>
>> Regards!
>>
>> Aron Tao
>>
>
>
> --
>
>
> Regards!
>
> Aron Tao
>


-- 


Regards!

Aron Tao

Re: doubt about measure of processedRowCount

Reply via email to