All parameters were default. I've found out that it is really related to
size estimation of count distinct measure. F2 family were underestimated
for about 4 times.

After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations
are good and it works much better.

It looks like default value of 0.05 is too low for bitmap and global
dictionary.

Cube description is attached.

On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi <shaofeng...@apache.org> wrote:

> Hi Alexander,
>
> Sometimes there will be over-estimation for the size if Cube has some
> complex measure like count distinct and topn, but seldom heard of less
> estimation. Did you change other parameters which may impact on the
> estimation in kylin.properties? Besides, if you can share the Cube
> definition, that would help (information like dimension/measure, rowkey
> encoding will also impact on the region split).
>
> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>
>> I've found out that sharding is done manually, so running split in hbase
>> shell breaks data.
>>
>> So the main problem is that region-cut doesn't work on hbase with s3. I
>> see that in the log it creates shards properly:
>>
>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>> steps.CreateHTableJob:193 : Expecting 4 regions.
>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>>
>> But then I get single 20GB region.
>>
>> Did anyone had same behaviour?
>>
>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <sterligo...@joom.it>
>> wrote:
>>
>>> hi,
>>>
>>> I noticed very large hbase region for one segment (more than 20GB and
>>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large,
>>> but anyway it degraded performance a lot, so I decided to split it in hbase.
>>>
>>> When the split has just started kylin started to return empty results
>>> for queries to this segment.
>>>
>>> Why can that happen?
>>>
>>> PS
>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in
>>> case if external hbase cluster is used.
>>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Attachment: cube.json
Description: application/json

Reply via email to