Thank you Xiaomao for the answer, it is good! Best regards,
Shaofeng Shi 史少锋 Apache Kylin PMC Email: [email protected] Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: [email protected] Join Kylin dev mail group: [email protected] lk_hadoop <[email protected]> 于2019年8月30日周五 上午10:50写道: > Thank you very much. > > 2019-08-30 > ------------------------------ > lk_hadoop > ------------------------------ > > *发件人:*maoxiaomao <[email protected]> > *发送时间:*2019-08-29 15:49 > *主题:*Re:What dose Data Size and Source Table Size mean > *收件人:*"[email protected]"<[email protected]> > *抄送:* > > Hi lk, > This is my understanding, I'm not quite sure about it. ( my kylin > version v2.6.1 ) > 1. Data Size : for each segment, each mr step, it is the output data > size, also it's one of the mapreduce counters. which can be seen in log as > "HDFS Write"(for Step#1) or "HDFS: Number of bytes written"(other MR Steps) > > 2. Source Table Size : is the size of Source Data read as String in > some point, and in the website it's a sum of each segment. Which is counter > of Step#2. Extract Fact Table Distinct Columns named "BYTES" of class > "org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter",it > can be seen at the bottom of Step#2.log as follow. it calculate as follow > In FactDistinctColumnsMapper.doMap: > and for hive the parseMapperInput work as : > also countSizeInBytes calculate as: > > 3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert > Cuboid Data to HFile, and in the website it's a sum of each segment. > > > > > -- > At 2019-08-28 14:13:00, "lk_hadoop" <[email protected]> wrote: > > hi,all: > I am not quite understand some index I saw on the kylin's web: > > #1 Step Name: Create Intermediate Flat Hive Table > Data Size: 72.06 GB > Duration: 6.07 mins Waiting: 0 seconds > > what is the "Data Size" mean ? all the records data size *2 ? > > > what is the "Source Table Size" mean? > > > thanks for your attention. > > 2019-08-28 > ------------------------------ > lk_hadoop > > > > > >
