Re: Data disappears if hbase splits region

2017-08-08 Thread Alexander Sterligov
Done!
https://issues.apache.org/jira/browse/KYLIN-2779

On Tue, Aug 8, 2017 at 10:02 AM, ShaoFeng Shi 
wrote:

> Okay, the estimation ratio is too small for bitmap type measure. Could you
> please open a JIRA with your findings? We can enhance that in the future
> release. Thanks!
>
> 2017-08-08 12:56 GMT+08:00 Alexander Sterligov :
>
>> Yes, I'm using lz4.
>>
>> On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi 
>> wrote:
>>
>>> Thanks for the input. Did you enable any compression (e.g, LZO,
>>> Snappy) for HBase?
>>>
>>> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov :
>>>
 All parameters were default. I've found out that it is really related
 to size estimation of count distinct measure. F2 family were underestimated
 for about 4 times.

 After I set kylin.cube.size-estimate-countdistinct-ratio=0.2
 estimations are good and it works much better.

 It looks like default value of 0.05 is too low for bitmap and global
 dictionary.

 Cube description is attached.

 On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi 
 wrote:

> Hi Alexander,
>
> Sometimes there will be over-estimation for the size if Cube has some
> complex measure like count distinct and topn, but seldom heard of less
> estimation. Did you change other parameters which may impact on the
> estimation in kylin.properties? Besides, if you can share the Cube
> definition, that would help (information like dimension/measure, rowkey
> encoding will also impact on the region split).
>
> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov :
>
>> I've found out that sharding is done manually, so running split in
>> hbase shell breaks data.
>>
>> So the main problem is that region-cut doesn't work on hbase with s3.
>> I see that in the log it creates shards properly:
>>
>> 2017-08-05 20:54:48,709 INFO  [Job 
>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
>> 2017-08-05 20:54:48,709 INFO  [Job 
>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>> steps.CreateHTableJob:193 : Expecting 4 regions.
>> 2017-08-05 20:54:48,709 INFO  [Job 
>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>>
>> But then I get single 20GB region.
>>
>> Did anyone had same behaviour?
>>
>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <
>> sterligo...@joom.it> wrote:
>>
>>> hi,
>>>
>>> I noticed very large hbase region for one segment (more than 20GB
>>> and kylin.storage.hbase.region-cut-gb=5). I don't know why it is so
>>> large, but anyway it degraded performance a lot, so I decided to split 
>>> it
>>> in hbase.
>>>
>>> When the split has just started kylin started to return empty
>>> results for queries to this segment.
>>>
>>> Why can that happen?
>>>
>>> PS
>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work
>>> in case if external hbase cluster is used.
>>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Data disappears if hbase splits region

2017-08-08 Thread ShaoFeng Shi
Okay, the estimation ratio is too small for bitmap type measure. Could you
please open a JIRA with your findings? We can enhance that in the future
release. Thanks!

2017-08-08 12:56 GMT+08:00 Alexander Sterligov :

> Yes, I'm using lz4.
>
> On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi 
> wrote:
>
>> Thanks for the input. Did you enable any compression (e.g, LZO,
>> Snappy) for HBase?
>>
>> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov :
>>
>>> All parameters were default. I've found out that it is really related to
>>> size estimation of count distinct measure. F2 family were underestimated
>>> for about 4 times.
>>>
>>> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2
>>> estimations are good and it works much better.
>>>
>>> It looks like default value of 0.05 is too low for bitmap and global
>>> dictionary.
>>>
>>> Cube description is attached.
>>>
>>> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi 
>>> wrote:
>>>
 Hi Alexander,

 Sometimes there will be over-estimation for the size if Cube has some
 complex measure like count distinct and topn, but seldom heard of less
 estimation. Did you change other parameters which may impact on the
 estimation in kylin.properties? Besides, if you can share the Cube
 definition, that would help (information like dimension/measure, rowkey
 encoding will also impact on the region split).

 2017-08-07 3:03 GMT+08:00 Alexander Sterligov :

> I've found out that sharding is done manually, so running split in
> hbase shell breaks data.
>
> So the main problem is that region-cut doesn't work on hbase with s3.
> I see that in the log it creates shards properly:
>
> 2017-08-05 20:54:48,709 INFO  [Job 
> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
> 2017-08-05 20:54:48,709 INFO  [Job 
> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:193 : Expecting 4 regions.
> 2017-08-05 20:54:48,709 INFO  [Job 
> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>
> But then I get single 20GB region.
>
> Did anyone had same behaviour?
>
> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <
> sterligo...@joom.it> wrote:
>
>> hi,
>>
>> I noticed very large hbase region for one segment (more than 20GB and
>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so
>> large, but anyway it degraded performance a lot, so I decided to split it
>> in hbase.
>>
>> When the split has just started kylin started to return empty results
>> for queries to this segment.
>>
>> Why can that happen?
>>
>> PS
>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work
>> in case if external hbase cluster is used.
>>
>
>


 --
 Best regards,

 Shaofeng Shi 史少锋


>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋


Re: Data disappears if hbase splits region

2017-08-07 Thread Alexander Sterligov
Yes, I'm using lz4.

On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi  wrote:

> Thanks for the input. Did you enable any compression (e.g, LZO,
> Snappy) for HBase?
>
> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov :
>
>> All parameters were default. I've found out that it is really related to
>> size estimation of count distinct measure. F2 family were underestimated
>> for about 4 times.
>>
>> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations
>> are good and it works much better.
>>
>> It looks like default value of 0.05 is too low for bitmap and global
>> dictionary.
>>
>> Cube description is attached.
>>
>> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi 
>> wrote:
>>
>>> Hi Alexander,
>>>
>>> Sometimes there will be over-estimation for the size if Cube has some
>>> complex measure like count distinct and topn, but seldom heard of less
>>> estimation. Did you change other parameters which may impact on the
>>> estimation in kylin.properties? Besides, if you can share the Cube
>>> definition, that would help (information like dimension/measure, rowkey
>>> encoding will also impact on the region split).
>>>
>>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov :
>>>
 I've found out that sharding is done manually, so running split in
 hbase shell breaks data.

 So the main problem is that region-cut doesn't work on hbase with s3. I
 see that in the log it creates shards properly:

 2017-08-05 20:54:48,709 INFO  [Job 
 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
 steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
 2017-08-05 20:54:48,709 INFO  [Job 
 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
 steps.CreateHTableJob:193 : Expecting 4 regions.
 2017-08-05 20:54:48,709 INFO  [Job 
 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
 steps.CreateHTableJob:194 : Expecting 5333 MB per region.

 But then I get single 20GB region.

 Did anyone had same behaviour?

 On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <
 sterligo...@joom.it> wrote:

> hi,
>
> I noticed very large hbase region for one segment (more than 20GB and
> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so
> large, but anyway it degraded performance a lot, so I decided to split it
> in hbase.
>
> When the split has just started kylin started to return empty results
> for queries to this segment.
>
> Why can that happen?
>
> PS
> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in
> case if external hbase cluster is used.
>


>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Data disappears if hbase splits region

2017-08-07 Thread ShaoFeng Shi
Thanks for the input. Did you enable any compression (e.g, LZO, Snappy) for
HBase?

2017-08-08 0:49 GMT+08:00 Alexander Sterligov :

> All parameters were default. I've found out that it is really related to
> size estimation of count distinct measure. F2 family were underestimated
> for about 4 times.
>
> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations
> are good and it works much better.
>
> It looks like default value of 0.05 is too low for bitmap and global
> dictionary.
>
> Cube description is attached.
>
> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi 
> wrote:
>
>> Hi Alexander,
>>
>> Sometimes there will be over-estimation for the size if Cube has some
>> complex measure like count distinct and topn, but seldom heard of less
>> estimation. Did you change other parameters which may impact on the
>> estimation in kylin.properties? Besides, if you can share the Cube
>> definition, that would help (information like dimension/measure, rowkey
>> encoding will also impact on the region split).
>>
>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov :
>>
>>> I've found out that sharding is done manually, so running split in hbase
>>> shell breaks data.
>>>
>>> So the main problem is that region-cut doesn't work on hbase with s3. I
>>> see that in the log it creates shards properly:
>>>
>>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
>>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>> steps.CreateHTableJob:193 : Expecting 4 regions.
>>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>>>
>>> But then I get single 20GB region.
>>>
>>> Did anyone had same behaviour?
>>>
>>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov >> > wrote:
>>>
 hi,

 I noticed very large hbase region for one segment (more than 20GB and
 kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large,
 but anyway it degraded performance a lot, so I decided to split it in 
 hbase.

 When the split has just started kylin started to return empty results
 for queries to this segment.

 Why can that happen?

 PS
 It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in
 case if external hbase cluster is used.

>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋


Re: Data disappears if hbase splits region

2017-08-06 Thread ShaoFeng Shi
Hi Alexander,

Sometimes there will be over-estimation for the size if Cube has some
complex measure like count distinct and topn, but seldom heard of less
estimation. Did you change other parameters which may impact on the
estimation in kylin.properties? Besides, if you can share the Cube
definition, that would help (information like dimension/measure, rowkey
encoding will also impact on the region split).

2017-08-07 3:03 GMT+08:00 Alexander Sterligov :

> I've found out that sharding is done manually, so running split in hbase
> shell breaks data.
>
> So the main problem is that region-cut doesn't work on hbase with s3. I
> see that in the log it creates shards properly:
>
> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:193 : Expecting 4 regions.
> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>
> But then I get single 20GB region.
>
> Did anyone had same behaviour?
>
> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov 
> wrote:
>
>> hi,
>>
>> I noticed very large hbase region for one segment (more than 20GB and
>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large,
>> but anyway it degraded performance a lot, so I decided to split it in hbase.
>>
>> When the split has just started kylin started to return empty results for
>> queries to this segment.
>>
>> Why can that happen?
>>
>> PS
>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in
>> case if external hbase cluster is used.
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋


Re: Data disappears if hbase splits region

2017-08-06 Thread Alexander Sterligov
I've found out that sharding is done manually, so running split in hbase
shell breaks data.

So the main problem is that region-cut doesn't work on hbase with s3. I see
that in the log it creates shards properly:

2017-08-05 20:54:48,709 INFO  [Job
1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:192 : Total
size 21334.075368547456M (estimated)
2017-08-05 20:54:48,709 INFO  [Job
1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:193 :
Expecting 4 regions.
2017-08-05 20:54:48,709 INFO  [Job
1175d3ed-504f-4eb0-a973-d57338fdff2c-892] steps.CreateHTableJob:194 :
Expecting 5333 MB per region.

But then I get single 20GB region.

Did anyone had same behaviour?

On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov 
wrote:

> hi,
>
> I noticed very large hbase region for one segment (more than 20GB and
> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large,
> but anyway it degraded performance a lot, so I decided to split it in hbase.
>
> When the split has just started kylin started to return empty results for
> queries to this segment.
>
> Why can that happen?
>
> PS
> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in
> case if external hbase cluster is used.
>