Thank you very much. 2019-08-30
lk_hadoop 发件人:maoxiaomao <[email protected]> 发送时间:2019-08-29 15:49 主题:Re:What dose Data Size and Source Table Size mean 收件人:"[email protected]"<[email protected]> 抄送: Hi lk, This is my understanding, I'm not quite sure about it. ( my kylin version v2.6.1 ) 1. Data Size : for each segment, each mr step, it is the output data size, also it's one of the mapreduce counters. which can be seen in log as "HDFS Write"(for Step#1) or "HDFS: Number of bytes written"(other MR Steps) 2. Source Table Size : is the size of Source Data read as String in some point, and in the website it's a sum of each segment. Which is counter of Step#2. Extract Fact Table Distinct Columns named "BYTES" of class "org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter",it can be seen at the bottom of Step#2.log as follow. it calculate as follow In FactDistinctColumnsMapper.doMap: and for hive the parseMapperInput work as : also countSizeInBytes calculate as: 3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert Cuboid Data to HFile, and in the website it's a sum of each segment. -- At 2019-08-28 14:13:00, "lk_hadoop" <[email protected]> wrote: hi,all: I am not quite understand some index I saw on the kylin's web: #1 Step Name: Create Intermediate Flat Hive Table Data Size: 72.06 GB Duration: 6.07 mins Waiting: 0 seconds what is the "Data Size" mean ? all the records data size *2 ? what is the "Source Table Size" mean? thanks for your attention. 2019-08-28 lk_hadoop
