Hi Qilong,

When merging segments, the dimension-measure values (k-v) will be re-orged
and the same key will be merged, so the merged size is not simply a sum of
each segment; usually, it is smaller than before.

Always using the statistics to estimate the size is for consistency. Of
course, there is room to improve the estimation accuracy.



2018-01-22 16:54 GMT+08:00 苏启龙 <[email protected]>:

>
> Hi,
>
> We have some unclear points about the segment size estimate when merging
> multi-segments.
>
> We find that the segment merge job still uses 
> CubeStatsReader::getCuboidSizeMap
> to estimate the total size of the merged segment. From our understanding,
> when building a new segment, Kylin uses this way to estimate the total size
> is OK since no other info we can turn to. But in merging we may sum the
> table size of the segments to be merged, which should be more accurate.
>
> So why for this consideration?
>
>
>
> Su Qilong
>



-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to