[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread deepankar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889149#comment-15889149
 ] 

deepankar edited comment on HBASE-16630 at 3/1/17 12:17 AM:


Uploaded the patch [^HBASE-16630-v3-branch-1.X.patch] for branch-1, branch-1.2, 
branch-1.3


was (Author: dvdreddy):
The patch for branch-1, branch-1.2, branch-1.3

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.X.patch, 
> HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-11-04 Thread deepankar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635468#comment-15635468
 ] 

deepankar edited comment on HBASE-16630 at 11/4/16 6:49 AM:


Yeah I am looking into stacks suggestion to counter the problem I have 
previously mentioned about re-eviction just cached blocks


was (Author: dvdreddy):
Yeah I am looking into stacks suggestion to counter the problem I have 
previously mentioned about re-evicted just cached blocks

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-19 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505542#comment-15505542
 ] 

chunhui shen edited comment on HBASE-16630 at 9/20/16 4:34 AM:
---

1.Fix the issue that FreeSpace doesn't free anything (Required)
2.Defragmentation  (Optional):
 a. Would increment cache usage ratio in short time. Otherwise will take 
relative long time without defragmentation. 
 b. Resource overhead (Byte Copy). Should limit the rate. For example, only 
trigger one time per hour.
 c. Add unit test to ensure the correctness with defragmentation.


Just my thought.


was (Author: zjushch):
1.Fix the issue that FreeSpace doesn't free anything (Required)
2.Defragmentation  (Optional):
 a. Would increment cache usage ratio in short time. Otherwise will take 
relative long time without defragmentation. 
 b. Resource overhead (Byte Copy). Should limit the rate. For example, only 
trigger one time per hour.
 c. Add unit test to ensure the correctness with defragmentation.


Just my thought.





optional

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-19 Thread deepankar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505115#comment-15505115
 ] 

deepankar edited comment on HBASE-16630 at 9/20/16 12:17 AM:
-

Yup looks like this is exactly what we want, the idea is to pick a random slab 
and evict all elements from it, but we can be a little bit more clever here and 
pick the slabs that would have least number of elements with in them so that we 
will do least number of reads to re-cache the elements from them. This idea is 
simple and could be done easily with current code structure also I think with 
small amount of code. [~vrodionov] what do you think about this heuristic ? I 
think this will also address the concerns [~zjushch] raised

[~vrodionov]Thanks for the idea


was (Author: dvdreddy):
Yup looks like this is exactly what we want, the idea is to pick a random slab 
and evict all elements from it, but we can be a little bit more clever here and 
pick the slabs that would have least number of elements with in them so that we 
will do least number of reads to re-cache the elements from them. This idea is 
simple and could be done easily with current code structure also I think with 
small amount of code. [~vrodionov] do you think ? This will also address the 
concerns [~zjushch] raised

[~vrodionov]Thanks for the idea

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489235#comment-15489235
 ] 

Vladimir Rodionov edited comment on HBASE-16630 at 9/14/16 3:22 AM:


How about offheap read support? Does it include reading directly from off heap 
cache? If, yes, then defragmentation will result  in SIGSEGV.


was (Author: vrodionov):
How about offheap read support? Does it include reading directly from off heap 
cache? If, yes, the defragmentation will result SIGSEGV.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489206#comment-15489206
 ] 

binlijin edited comment on HBASE-16630 at 9/14/16 3:06 AM:
---

Looks like there is extra copy in deFragment, it is possible to reduce it?


was (Author: aoxiang):
Looks like there is extra copy in deFragment, can we reduce it?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)