Hi lk,
This is my understanding, I'm not quite sure about it. ( my kylin version
v2.6.1 )
1. Data Size : for each segment, each mr step, it is the output data
size, also it's one of the mapreduce counters. which can be seen in log as
"HDFS Write"(for Step#1) or "HDFS: Number of bytes written"(other MR Steps)
2. Source Table Size : is the size of Source Data read as String in some
point, and in the website it's a sum of each segment. Which is counter of
Step#2. Extract Fact Table Distinct Columns named "BYTES" of class
"org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter",it
can be seen at the bottom of Step#2.log as follow. it calculate as follow
In FactDistinctColumnsMapper.doMap:
and for hive the parseMapperInput work as :
also countSizeInBytes calculate as:
3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert Cuboid
Data to HFile, and in the website it's a sum of each segment.
--
At 2019-08-28 14:13:00, "lk_hadoop" <[email protected]> wrote:
hi,all:
I am not quite understand some index I saw on the kylin's web:
#1 Step Name: Create Intermediate Flat Hive Table
Data Size: 72.06 GB
Duration: 6.07 mins Waiting: 0 seconds
what is the "Data Size" mean ? all the records data size *2 ?
what is the "Source Table Size" mean?
thanks for your attention.
2019-08-28
lk_hadoop