Done!
https://issues.apache.org/jira/browse/KYLIN-2779

On Tue, Aug 8, 2017 at 10:02 AM, ShaoFeng Shi <shaofeng...@apache.org>
wrote:

> Okay, the estimation ratio is too small for bitmap type measure. Could you
> please open a JIRA with your findings? We can enhance that in the future
> release. Thanks!
>
> 2017-08-08 12:56 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>
>> Yes, I'm using lz4.
>>
>> On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi <shaofeng...@apache.org>
>> wrote:
>>
>>> Thanks for the input. Did you enable any compression (e.g, LZO,
>>> Snappy) for HBase?
>>>
>>> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>>
>>>> All parameters were default. I've found out that it is really related
>>>> to size estimation of count distinct measure. F2 family were underestimated
>>>> for about 4 times.
>>>>
>>>> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2
>>>> estimations are good and it works much better.
>>>>
>>>> It looks like default value of 0.05 is too low for bitmap and global
>>>> dictionary.
>>>>
>>>> Cube description is attached.
>>>>
>>>> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi <shaofeng...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Alexander,
>>>>>
>>>>> Sometimes there will be over-estimation for the size if Cube has some
>>>>> complex measure like count distinct and topn, but seldom heard of less
>>>>> estimation. Did you change other parameters which may impact on the
>>>>> estimation in kylin.properties? Besides, if you can share the Cube
>>>>> definition, that would help (information like dimension/measure, rowkey
>>>>> encoding will also impact on the region split).
>>>>>
>>>>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:
>>>>>
>>>>>> I've found out that sharding is done manually, so running split in
>>>>>> hbase shell breaks data.
>>>>>>
>>>>>> So the main problem is that region-cut doesn't work on hbase with s3.
>>>>>> I see that in the log it creates shards properly:
>>>>>>
>>>>>> 2017-08-05 20:54:48,709 INFO  [Job 
>>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>>>>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
>>>>>> 2017-08-05 20:54:48,709 INFO  [Job 
>>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>>>>> steps.CreateHTableJob:193 : Expecting 4 regions.
>>>>>> 2017-08-05 20:54:48,709 INFO  [Job 
>>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>>>>> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>>>>>>
>>>>>> But then I get single 20GB region.
>>>>>>
>>>>>> Did anyone had same behaviour?
>>>>>>
>>>>>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <
>>>>>> sterligo...@joom.it> wrote:
>>>>>>
>>>>>>> hi,
>>>>>>>
>>>>>>> I noticed very large hbase region for one segment (more than 20GB
>>>>>>> and kylin.storage.hbase.region-cut-gb=5). I don't know why it is so
>>>>>>> large, but anyway it degraded performance a lot, so I decided to split 
>>>>>>> it
>>>>>>> in hbase.
>>>>>>>
>>>>>>> When the split has just started kylin started to return empty
>>>>>>> results for queries to this segment.
>>>>>>>
>>>>>>> Why can that happen?
>>>>>>>
>>>>>>> PS
>>>>>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work
>>>>>>> in case if external hbase cluster is used.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Shaofeng Shi 史少锋
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Reply via email to