Done! https://issues.apache.org/jira/browse/KYLIN-2779
On Tue, Aug 8, 2017 at 10:02 AM, ShaoFeng Shi <shaofeng...@apache.org> wrote: > Okay, the estimation ratio is too small for bitmap type measure. Could you > please open a JIRA with your findings? We can enhance that in the future > release. Thanks! > > 2017-08-08 12:56 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>: > >> Yes, I'm using lz4. >> >> On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi <shaofeng...@apache.org> >> wrote: >> >>> Thanks for the input. Did you enable any compression (e.g, LZO, >>> Snappy) for HBase? >>> >>> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>: >>> >>>> All parameters were default. I've found out that it is really related >>>> to size estimation of count distinct measure. F2 family were underestimated >>>> for about 4 times. >>>> >>>> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 >>>> estimations are good and it works much better. >>>> >>>> It looks like default value of 0.05 is too low for bitmap and global >>>> dictionary. >>>> >>>> Cube description is attached. >>>> >>>> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi <shaofeng...@apache.org> >>>> wrote: >>>> >>>>> Hi Alexander, >>>>> >>>>> Sometimes there will be over-estimation for the size if Cube has some >>>>> complex measure like count distinct and topn, but seldom heard of less >>>>> estimation. Did you change other parameters which may impact on the >>>>> estimation in kylin.properties? Besides, if you can share the Cube >>>>> definition, that would help (information like dimension/measure, rowkey >>>>> encoding will also impact on the region split). >>>>> >>>>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov <sterligo...@joom.it>: >>>>> >>>>>> I've found out that sharding is done manually, so running split in >>>>>> hbase shell breaks data. >>>>>> >>>>>> So the main problem is that region-cut doesn't work on hbase with s3. >>>>>> I see that in the log it creates shards properly: >>>>>> >>>>>> 2017-08-05 20:54:48,709 INFO [Job >>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>>>>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) >>>>>> 2017-08-05 20:54:48,709 INFO [Job >>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>>>>> steps.CreateHTableJob:193 : Expecting 4 regions. >>>>>> 2017-08-05 20:54:48,709 INFO [Job >>>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>>>>> steps.CreateHTableJob:194 : Expecting 5333 MB per region. >>>>>> >>>>>> But then I get single 20GB region. >>>>>> >>>>>> Did anyone had same behaviour? >>>>>> >>>>>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov < >>>>>> sterligo...@joom.it> wrote: >>>>>> >>>>>>> hi, >>>>>>> >>>>>>> I noticed very large hbase region for one segment (more than 20GB >>>>>>> and kylin.storage.hbase.region-cut-gb=5). I don't know why it is so >>>>>>> large, but anyway it degraded performance a lot, so I decided to split >>>>>>> it >>>>>>> in hbase. >>>>>>> >>>>>>> When the split has just started kylin started to return empty >>>>>>> results for queries to this segment. >>>>>>> >>>>>>> Why can that happen? >>>>>>> >>>>>>> PS >>>>>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work >>>>>>> in case if external hbase cluster is used. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> Shaofeng Shi 史少锋 >>>>> >>>>> >>>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >