Hi lk,
   This is my understanding, I'm not quite sure about it. ( my kylin version 
v2.6.1 )
      1. Data Size : for each segment, each mr step, it is the output data 
size, also it's one of the mapreduce counters. which can be seen in log  as 
"HDFS Write"(for Step#1) or "HDFS: Number of bytes written"(other MR Steps)
 
      2. Source Table Size : is the size of Source Data read as String in some 
point, and in the website it's a sum of each segment. Which is counter of 
Step#2. Extract Fact Table Distinct Columns named "BYTES" of class 
"org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter",it 
can be seen at the bottom of Step#2.log as follow.   it calculate as follow 
      In FactDistinctColumnsMapper.doMap:
      and for hive the parseMapperInput work as :
      also countSizeInBytes calculate as:
 
     3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert Cuboid 
Data to HFile, and in the website it's a sum of each segment.    





--

At 2019-08-28 14:13:00, "lk_hadoop" <[email protected]> wrote:

hi,all:
I am not quite understand some index I saw on the kylin's web:
 
#1 Step Name: Create Intermediate Flat Hive Table

Data Size: 72.06 GB 

Duration: 6.07 mins Waiting: 0 seconds
 
what is the "Data Size" mean ?  all the records data size *2 ?
 
 
what is the "Source Table Size" mean?
 
 
thanks for your attention.
 
2019-08-28
lk_hadoop

Reply via email to