Thank you very much.

2019-08-30 

lk_hadoop 



发件人:maoxiaomao <[email protected]>
发送时间:2019-08-29 15:49
主题:Re:What dose Data Size and Source Table Size mean
收件人:"[email protected]"<[email protected]>
抄送:

Hi lk,
   This is my understanding, I'm not quite sure about it. ( my kylin version 
v2.6.1 )
      1. Data Size : for each segment, each mr step, it is the output data 
size, also it's one of the mapreduce counters. which can be seen in log  as 
"HDFS Write"(for Step#1) or "HDFS: Number of bytes written"(other MR Steps)
 
      2. Source Table Size : is the size of Source Data read as String in some 
point, and in the website it's a sum of each segment. Which is counter of 
Step#2. Extract Fact Table Distinct Columns named "BYTES" of class 
"org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter",it 
can be seen at the bottom of Step#2.log as follow.   it calculate as follow 

      In FactDistinctColumnsMapper.doMap:

      and for hive the parseMapperInput work as :

      also countSizeInBytes calculate as:
 
     3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert Cuboid 
Data to HFile, and in the website it's a sum of each segment.    




--
At 2019-08-28 14:13:00, "lk_hadoop" <[email protected]> wrote:

hi,all:
I am not quite understand some index I saw on the kylin's web:

#1 Step Name: Create Intermediate Flat Hive Table

Data Size: 72.06 GB 

Duration: 6.07 mins Waiting: 0 seconds

what is the "Data Size" mean ?  all the records data size *2 ?



what is the "Source Table Size" mean?


thanks for your attention.

2019-08-28


lk_hadoop 

Reply via email to