Hi Alexander,

Sometimes there will be over-estimation for the size if Cube has some
complex measure like count distinct and topn, but seldom heard of less
estimation. Did you change other parameters which may impact on the
estimation in kylin.properties? Besides, if you can share the Cube
definition, that would help (information like dimension/measure, rowkey
encoding will also impact on the region split).

2017-08-07 3:03 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>:

> I've found out that sharding is done manually, so running split in hbase
> shell breaks data.
>
> So the main problem is that region-cut doesn't work on hbase with s3. I
> see that in the log it creates shards properly:
>
> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:193 : Expecting 4 regions.
> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>
> But then I get single 20GB region.
>
> Did anyone had same behaviour?
>
> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <sterligo...@joom.it>
> wrote:
>
>> hi,
>>
>> I noticed very large hbase region for one segment (more than 20GB and
>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large,
>> but anyway it degraded performance a lot, so I decided to split it in hbase.
>>
>> When the split has just started kylin started to return empty results for
>> queries to this segment.
>>
>> Why can that happen?
>>
>> PS
>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in
>> case if external hbase cluster is used.
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to