Thank you Xiaomao for the answer, it is good!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]




lk_hadoop <[email protected]> 于2019年8月30日周五 上午10:50写道:

> Thank you very much.
>
> 2019-08-30
> ------------------------------
> lk_hadoop
> ------------------------------
>
> *发件人:*maoxiaomao <[email protected]>
> *发送时间:*2019-08-29 15:49
> *主题:*Re:What dose Data Size and Source Table Size mean
> *收件人:*"[email protected]"<[email protected]>
> *抄送:*
>
> Hi lk,
>    This is my understanding, I'm not quite sure about it. ( my kylin
> version v2.6.1 )
>       1. Data Size : for each segment, each mr step, it is the output data
> size, also it's one of the mapreduce counters. which can be seen in log  as
> "HDFS Write"(for Step#1) or "HDFS: Number of bytes written"(other MR Steps)
>
>       2. Source Table Size : is the size of Source Data read as String in
> some point, and in the website it's a sum of each segment. Which is counter
> of Step#2. Extract Fact Table Distinct Columns named "BYTES" of class 
> "org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter",it
> can be seen at the bottom of Step#2.log as follow.   it calculate as follow
>       In FactDistinctColumnsMapper.doMap:
>       and for hive the parseMapperInput work as :
>       also countSizeInBytes calculate as:
>
>      3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert
> Cuboid Data to HFile, and in the website it's a sum of each segment.
>
>
>
>
> --
> At 2019-08-28 14:13:00, "lk_hadoop" <[email protected]> wrote:
>
> hi,all:
> I am not quite understand some index I saw on the kylin's web:
>
> #1 Step Name: Create Intermediate Flat Hive Table
> Data Size: 72.06 GB
> Duration: 6.07 mins Waiting: 0 seconds
>
> what is the "Data Size" mean ?  all the records data size *2 ?
>
>
> what is the "Source Table Size" mean?
>
>
> thanks for your attention.
>
> 2019-08-28
> ------------------------------
> lk_hadoop
>
>
>
>
>
>

Reply via email to