Okay, the estimation ratio is too small for bitmap type measure. Could you
please open a JIRA with your findings? We can enhance that in the future
release. Thanks!

2017-08-08 12:56 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:

> Yes, I'm using lz4.
>
> On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi <shaofeng...@apache.org>
> wrote:
>
>> Thanks for the input. Did you enable any compression (e.g, LZO,
>> Snappy) for HBase?
>>
>> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>
>>> All parameters were default. I've found out that it is really related to
>>> size estimation of count distinct measure. F2 family were underestimated
>>> for about 4 times.
>>>
>>> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2
>>> estimations are good and it works much better.
>>>
>>> It looks like default value of 0.05 is too low for bitmap and global
>>> dictionary.
>>>
>>> Cube description is attached.
>>>
>>> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi <shaofeng...@apache.org>
>>> wrote:
>>>
>>>> Hi Alexander,
>>>>
>>>> Sometimes there will be over-estimation for the size if Cube has some
>>>> complex measure like count distinct and topn, but seldom heard of less
>>>> estimation. Did you change other parameters which may impact on the
>>>> estimation in kylin.properties? Besides, if you can share the Cube
>>>> definition, that would help (information like dimension/measure, rowkey
>>>> encoding will also impact on the region split).
>>>>
>>>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>>>
>>>>> I've found out that sharding is done manually, so running split in
>>>>> hbase shell breaks data.
>>>>>
>>>>> So the main problem is that region-cut doesn't work on hbase with s3.
>>>>> I see that in the log it creates shards properly:
>>>>>
>>>>> 2017-08-05 20:54:48,709 INFO  [Job 
>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>>>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
>>>>> 2017-08-05 20:54:48,709 INFO  [Job 
>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>>>> steps.CreateHTableJob:193 : Expecting 4 regions.
>>>>> 2017-08-05 20:54:48,709 INFO  [Job 
>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>>>> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>>>>>
>>>>> But then I get single 20GB region.
>>>>>
>>>>> Did anyone had same behaviour?
>>>>>
>>>>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <
>>>>> sterligo...@joom.it> wrote:
>>>>>
>>>>>> hi,
>>>>>>
>>>>>> I noticed very large hbase region for one segment (more than 20GB and
>>>>>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so
>>>>>> large, but anyway it degraded performance a lot, so I decided to split it
>>>>>> in hbase.
>>>>>>
>>>>>> When the split has just started kylin started to return empty results
>>>>>> for queries to this segment.
>>>>>>
>>>>>> Why can that happen?
>>>>>>
>>>>>> PS
>>>>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work
>>>>>> in case if external hbase cluster is used.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to