Hi Qilong, When merging segments, the dimension-measure values (k-v) will be re-orged and the same key will be merged, so the merged size is not simply a sum of each segment; usually, it is smaller than before.
Always using the statistics to estimate the size is for consistency. Of course, there is room to improve the estimation accuracy. 2018-01-22 16:54 GMT+08:00 苏启龙 <[email protected]>: > > Hi, > > We have some unclear points about the segment size estimate when merging > multi-segments. > > We find that the segment merge job still uses > CubeStatsReader::getCuboidSizeMap > to estimate the total size of the merged segment. From our understanding, > when building a new segment, Kylin uses this way to estimate the total size > is OK since no other info we can turn to. But in merging we may sum the > table size of the segments to be merged, which should be more accurate. > > So why for this consideration? > > > > Su Qilong > -- Best regards, Shaofeng Shi 史少锋
