Re: Data disappears if hbase splits region
Done! https://issues.apache.org/jira/browse/KYLIN-2779 On Tue, Aug 8, 2017 at 10:02 AM, ShaoFeng Shiwrote: > Okay, the estimation ratio is too small for bitmap type measure. Could you > please open a JIRA with your findings? We can enhance that in the future > release. Thanks! > > 2017-08-08 12:56 GMT+08:00 Alexander Sterligov : > >> Yes, I'm using lz4. >> >> On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi >> wrote: >> >>> Thanks for the input. Did you enable any compression (e.g, LZO, >>> Snappy) for HBase? >>> >>> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov : >>> All parameters were default. I've found out that it is really related to size estimation of count distinct measure. F2 family were underestimated for about 4 times. After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations are good and it works much better. It looks like default value of 0.05 is too low for bitmap and global dictionary. Cube description is attached. On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi wrote: > Hi Alexander, > > Sometimes there will be over-estimation for the size if Cube has some > complex measure like count distinct and topn, but seldom heard of less > estimation. Did you change other parameters which may impact on the > estimation in kylin.properties? Besides, if you can share the Cube > definition, that would help (information like dimension/measure, rowkey > encoding will also impact on the region split). > > 2017-08-07 3:03 GMT+08:00 Alexander Sterligov : > >> I've found out that sharding is done manually, so running split in >> hbase shell breaks data. >> >> So the main problem is that region-cut doesn't work on hbase with s3. >> I see that in the log it creates shards properly: >> >> 2017-08-05 20:54:48,709 INFO [Job >> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) >> 2017-08-05 20:54:48,709 INFO [Job >> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >> steps.CreateHTableJob:193 : Expecting 4 regions. >> 2017-08-05 20:54:48,709 INFO [Job >> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >> steps.CreateHTableJob:194 : Expecting 5333 MB per region. >> >> But then I get single 20GB region. >> >> Did anyone had same behaviour? >> >> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov < >> sterligo...@joom.it> wrote: >> >>> hi, >>> >>> I noticed very large hbase region for one segment (more than 20GB >>> and kylin.storage.hbase.region-cut-gb=5). I don't know why it is so >>> large, but anyway it degraded performance a lot, so I decided to split >>> it >>> in hbase. >>> >>> When the split has just started kylin started to return empty >>> results for queries to this segment. >>> >>> Why can that happen? >>> >>> PS >>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work >>> in case if external hbase cluster is used. >>> >> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
Re: Data disappears if hbase splits region
Okay, the estimation ratio is too small for bitmap type measure. Could you please open a JIRA with your findings? We can enhance that in the future release. Thanks! 2017-08-08 12:56 GMT+08:00 Alexander Sterligov: > Yes, I'm using lz4. > > On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi > wrote: > >> Thanks for the input. Did you enable any compression (e.g, LZO, >> Snappy) for HBase? >> >> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov : >> >>> All parameters were default. I've found out that it is really related to >>> size estimation of count distinct measure. F2 family were underestimated >>> for about 4 times. >>> >>> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 >>> estimations are good and it works much better. >>> >>> It looks like default value of 0.05 is too low for bitmap and global >>> dictionary. >>> >>> Cube description is attached. >>> >>> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi >>> wrote: >>> Hi Alexander, Sometimes there will be over-estimation for the size if Cube has some complex measure like count distinct and topn, but seldom heard of less estimation. Did you change other parameters which may impact on the estimation in kylin.properties? Besides, if you can share the Cube definition, that would help (information like dimension/measure, rowkey encoding will also impact on the region split). 2017-08-07 3:03 GMT+08:00 Alexander Sterligov : > I've found out that sharding is done manually, so running split in > hbase shell breaks data. > > So the main problem is that region-cut doesn't work on hbase with s3. > I see that in the log it creates shards properly: > > 2017-08-05 20:54:48,709 INFO [Job > 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] > steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) > 2017-08-05 20:54:48,709 INFO [Job > 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] > steps.CreateHTableJob:193 : Expecting 4 regions. > 2017-08-05 20:54:48,709 INFO [Job > 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] > steps.CreateHTableJob:194 : Expecting 5333 MB per region. > > But then I get single 20GB region. > > Did anyone had same behaviour? > > On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov < > sterligo...@joom.it> wrote: > >> hi, >> >> I noticed very large hbase region for one segment (more than 20GB and >> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so >> large, but anyway it degraded performance a lot, so I decided to split it >> in hbase. >> >> When the split has just started kylin started to return empty results >> for queries to this segment. >> >> Why can that happen? >> >> PS >> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work >> in case if external hbase cluster is used. >> > > -- Best regards, Shaofeng Shi 史少锋 >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋
Re: Data disappears if hbase splits region
Yes, I'm using lz4. On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shiwrote: > Thanks for the input. Did you enable any compression (e.g, LZO, > Snappy) for HBase? > > 2017-08-08 0:49 GMT+08:00 Alexander Sterligov : > >> All parameters were default. I've found out that it is really related to >> size estimation of count distinct measure. F2 family were underestimated >> for about 4 times. >> >> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations >> are good and it works much better. >> >> It looks like default value of 0.05 is too low for bitmap and global >> dictionary. >> >> Cube description is attached. >> >> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi >> wrote: >> >>> Hi Alexander, >>> >>> Sometimes there will be over-estimation for the size if Cube has some >>> complex measure like count distinct and topn, but seldom heard of less >>> estimation. Did you change other parameters which may impact on the >>> estimation in kylin.properties? Besides, if you can share the Cube >>> definition, that would help (information like dimension/measure, rowkey >>> encoding will also impact on the region split). >>> >>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov : >>> I've found out that sharding is done manually, so running split in hbase shell breaks data. So the main problem is that region-cut doesn't work on hbase with s3. I see that in the log it creates shards properly: 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:193 : Expecting 4 regions. 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:194 : Expecting 5333 MB per region. But then I get single 20GB region. Did anyone had same behaviour? On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov < sterligo...@joom.it> wrote: > hi, > > I noticed very large hbase region for one segment (more than 20GB and > kylin.storage.hbase.region-cut-gb=5). I don't know why it is so > large, but anyway it degraded performance a lot, so I decided to split it > in hbase. > > When the split has just started kylin started to return empty results > for queries to this segment. > > Why can that happen? > > PS > It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in > case if external hbase cluster is used. > >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
Re: Data disappears if hbase splits region
Thanks for the input. Did you enable any compression (e.g, LZO, Snappy) for HBase? 2017-08-08 0:49 GMT+08:00 Alexander Sterligov: > All parameters were default. I've found out that it is really related to > size estimation of count distinct measure. F2 family were underestimated > for about 4 times. > > After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations > are good and it works much better. > > It looks like default value of 0.05 is too low for bitmap and global > dictionary. > > Cube description is attached. > > On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi > wrote: > >> Hi Alexander, >> >> Sometimes there will be over-estimation for the size if Cube has some >> complex measure like count distinct and topn, but seldom heard of less >> estimation. Did you change other parameters which may impact on the >> estimation in kylin.properties? Besides, if you can share the Cube >> definition, that would help (information like dimension/measure, rowkey >> encoding will also impact on the region split). >> >> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov : >> >>> I've found out that sharding is done manually, so running split in hbase >>> shell breaks data. >>> >>> So the main problem is that region-cut doesn't work on hbase with s3. I >>> see that in the log it creates shards properly: >>> >>> 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) >>> 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>> steps.CreateHTableJob:193 : Expecting 4 regions. >>> 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>> steps.CreateHTableJob:194 : Expecting 5333 MB per region. >>> >>> But then I get single 20GB region. >>> >>> Did anyone had same behaviour? >>> >>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov >> > wrote: >>> hi, I noticed very large hbase region for one segment (more than 20GB and kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large, but anyway it degraded performance a lot, so I decided to split it in hbase. When the split has just started kylin started to return empty results for queries to this segment. Why can that happen? PS It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in case if external hbase cluster is used. >>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋
Re: Data disappears if hbase splits region
Hi Alexander, Sometimes there will be over-estimation for the size if Cube has some complex measure like count distinct and topn, but seldom heard of less estimation. Did you change other parameters which may impact on the estimation in kylin.properties? Besides, if you can share the Cube definition, that would help (information like dimension/measure, rowkey encoding will also impact on the region split). 2017-08-07 3:03 GMT+08:00 Alexander Sterligov: > I've found out that sharding is done manually, so running split in hbase > shell breaks data. > > So the main problem is that region-cut doesn't work on hbase with s3. I > see that in the log it creates shards properly: > > 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] > steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) > 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] > steps.CreateHTableJob:193 : Expecting 4 regions. > 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] > steps.CreateHTableJob:194 : Expecting 5333 MB per region. > > But then I get single 20GB region. > > Did anyone had same behaviour? > > On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov > wrote: > >> hi, >> >> I noticed very large hbase region for one segment (more than 20GB and >> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large, >> but anyway it degraded performance a lot, so I decided to split it in hbase. >> >> When the split has just started kylin started to return empty results for >> queries to this segment. >> >> Why can that happen? >> >> PS >> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in >> case if external hbase cluster is used. >> > > -- Best regards, Shaofeng Shi 史少锋
Re: Data disappears if hbase splits region
I've found out that sharding is done manually, so running split in hbase shell breaks data. So the main problem is that region-cut doesn't work on hbase with s3. I see that in the log it creates shards properly: 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:193 : Expecting 4 regions. 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:194 : Expecting 5333 MB per region. But then I get single 20GB region. Did anyone had same behaviour? On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligovwrote: > hi, > > I noticed very large hbase region for one segment (more than 20GB and > kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large, > but anyway it degraded performance a lot, so I decided to split it in hbase. > > When the split has just started kylin started to return empty results for > queries to this segment. > > Why can that happen? > > PS > It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in > case if external hbase cluster is used. >