Hi Qilong,

If seg A's estimation size is 10 GB, but real size is 5 GB; then when merge
or build another segment, we can adjust the estimated size by divide by 2.
Then it should be closer with real size.

2018-01-24 9:49 GMT+08:00 苏启龙 <[email protected]>:

> Many thanks shaofeng! We’ll check more on these parameters to see how to
> make it better.
>
> BTW, what do u mean by the last line? I mean by which way I can introduce
> the actual size to help Kylin to adjust the estimation? Currently I can
> only use the max-regions parameter manually, but this is not convenient for
> auto-merging.
>
> QIlong
>
> 发件人: ShaoFeng Shi <[email protected]>
> 答复: "[email protected]" <[email protected]>
> 日期: 2018年1月23日 星期二 21:49
>
> 至: user <[email protected]>
> 抄送: 林豪(linhao)-技术产品中心 <[email protected]>
> 主题: Re: segment size estimate when merging
>
> Hi Qilong,
>
> Does your cube have count-distinct or Top-N measure?
>
> If you observed that there are too many or too small hbase regions, you
> can adjust some parameters:
>
> kylin.cube.size-estimate-ratio=0.25
> kylin.cube.size-estimate-countdistinct-ratio=0.05
>
> The default ratio for common case is 0.25, you can set it to smaller if
> the estimated size is bigger than actual size. These two parameters can be
> set at Cube level.
>
> A better way is when doing merge, using the actual size of existing
> segments to adjust the estimated size, then get a closer result.
>
> 2018-01-23 14:47 GMT+08:00 苏启龙 <[email protected]>:
>
>> Hi shaofeng,
>>
>> Yes, it’s usually smaller then the sum of each segment, but usually a
>> small amount compared with the total size.
>>
>> But for the statistics estimate, usually result in a N times larger than
>> it actually be, and results in a huge waste of HBase region numbers。
>>
>>
>>    1. Do you have any data about deviation of the two ways in
>>    statistics? I mean generally which way will be closer?
>>    2. Is there any improve plan for this in the roadmap? Or some
>>    consideration to give more options to user to select their own estimate
>>    algo?
>>
>>
>> Thanks
>>
>> Qilong
>>
>> 发件人: ShaoFeng Shi <[email protected]>
>> 答复: "[email protected]" <[email protected]>
>> 日期: 2018年1月23日 星期二 09:43
>> 至: user <[email protected]>
>> 抄送: 林豪(linhao)-技术产品中心 <[email protected]>
>> 主题: Re: segment size estimate when merging
>>
>> Hi Qilong,
>>
>> When merging segments, the dimension-measure values (k-v) will be
>> re-orged and the same key will be merged, so the merged size is not simply
>> a sum of each segment; usually, it is smaller than before.
>>
>> Always using the statistics to estimate the size is for consistency. Of
>> course, there is room to improve the estimation accuracy.
>>
>>
>>
>> 2018-01-22 16:54 GMT+08:00 苏启龙 <[email protected]>:
>>
>>>
>>> Hi,
>>>
>>> We have some unclear points about the segment size estimate when merging
>>> multi-segments.
>>>
>>> We find that the segment merge job still uses
>>> CubeStatsReader::getCuboidSizeMap to estimate the total size of the
>>> merged segment. From our understanding, when building a new segment, Kylin
>>> uses this way to estimate the total size is OK since no other info we can
>>> turn to. But in merging we may sum the table size of the segments to be
>>> merged, which should be more accurate.
>>>
>>> So why for this consideration?
>>>
>>>
>>>
>>> Su Qilong
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to