subject:"\[jira\] \[Commented\] \(HBASE\-16630\) Fragmentation in long running Bucket Cache"

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-05 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896860#comment-15896860
 ] 

deepankar commented on HBASE-16630:
---

Thanks guys for pushing this and dealing with my errors and unresponsiveness

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895615#comment-15895615
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #111 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/111/])
HBASE-16630 Fragmentation in long running Bucket Cache (ramkrishna: rev 
ea5ac3856cde31b17615bbf38f54f2808a5647ff)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895602#comment-15895602
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #106 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/106/])
HBASE-16630 Fragmentation in long running Bucket Cache (ramkrishna: rev 
ea5ac3856cde31b17615bbf38f54f2808a5647ff)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895545#comment-15895545
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #844 (See 
[https://builds.apache.org/job/HBase-1.2-IT/844/])
HBASE-16630 Fragmentation in long running Bucket Cache (ramkrishna: rev 
ea5ac3856cde31b17615bbf38f54f2808a5647ff)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-03 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893882#comment-15893882
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


[~dvdreddy]
You want me prepare a patch for branch-1.2? Let me know I can do it.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890868#comment-15890868
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #129 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/129/])
HBASE-16630 Fragmentation in long running Bucket Cache (ramkrishna: rev 
e08f4bf6843d677a942137873328dc035616cf6f)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890731#comment-15890731
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.4 #653 (See 
[https://builds.apache.org/job/HBase-1.4/653/])
HBASE-16630 Fragmentation in long running Bucket Cache (ramkrishna: rev 
f64a52a0c2ecfbad070daebdf16b6b514231ec81)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890493#comment-15890493
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #118 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/118/])
HBASE-16630 Fragmentation in long running Bucket Cache (ramkrishna: rev 
e08f4bf6843d677a942137873328dc035616cf6f)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890098#comment-15890098
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


I corrected the patch subject manually and applied it. How ever the patch does 
not apply to branch-1.2. Could you pls have a look at it [~dvdreddy].

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890034#comment-15890034
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #117 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/117/])
Revert "HBASE-16630 Handle Fragmentation in bucket cache" (ramkrishna: rev 
052343a17e4a4b6aba97ed27241c6f55b0bfe090)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889973#comment-15889973
 ] 

Hudson commented on HBASE-16630:


FAILURE: Integrated in Jenkins build HBase-1.2-IT #608 (See 
[https://builds.apache.org/job/HBase-1.2-IT/608/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
2fcfb847efadd005b240d9a54257e6f3cde4c9f8)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
Revert "HBASE-16630 Handle Fragmentation in bucket cache" (busbey: rev 
5ac32c4b7a79e8946e6064ee665df346f32ae0db)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889954#comment-15889954
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.4 #652 (See 
[https://builds.apache.org/job/HBase-1.4/652/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
779f9cb07feab2ad7ba758f2f28ba0f32fffb17e)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
Revert "HBASE-16630 Handle Fragmentation in bucket cache" (ramkrishna: rev 
bef27396e56c42c0be9b43f75504b76438c6877f)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889903#comment-15889903
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #128 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/128/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
fa61aad9ceb48e720bf04f23e11c4bc70c208412)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
Revert "HBASE-16630 Handle Fragmentation in bucket cache" (ramkrishna: rev 
052343a17e4a4b6aba97ed27241c6f55b0bfe090)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889844#comment-15889844
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


There are no compilation issues now.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889774#comment-15889774
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


[~dvdreddy]
I think the patch name should be the JIRA number with the title. Can you 
reformat the patch accordingly?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889763#comment-15889763
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #109 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/109/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
2fcfb847efadd005b240d9a54257e6f3cde4c9f8)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
Revert "HBASE-16630 Handle Fragmentation in bucket cache" (busbey: rev 
5ac32c4b7a79e8946e6064ee665df346f32ae0db)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889752#comment-15889752
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #103 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/103/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
2fcfb847efadd005b240d9a54257e6f3cde4c9f8)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
Revert "HBASE-16630 Handle Fragmentation in bucket cache" (busbey: rev 
5ac32c4b7a79e8946e6064ee665df346f32ae0db)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889710#comment-15889710
 ] 

deepankar commented on HBASE-16630:
---

Sorry guys I forgot to check for compilation and other minor stuff. I will make 
sure to do this from next time

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889632#comment-15889632
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #841 (See 
[https://builds.apache.org/job/HBase-1.3-IT/841/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
2fcfb847efadd005b240d9a54257e6f3cde4c9f8)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
Revert "HBASE-16630 Handle Fragmentation in bucket cache" (busbey: rev 
5ac32c4b7a79e8946e6064ee665df346f32ae0db)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889570#comment-15889570
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


Reverted from branch-1 and branch-3. Just saw that [~busbey] has already 
reverted from branch-1.2.
I verified the patch but did not see this compilation issue. My bad.
[~dvdreddy]
can you pls check this.


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889560#comment-15889560
 ] 

Sean Busbey commented on HBASE-16630:
-

reverted from branch-1.2 because it caused a compilation error. please check 
contributions before pushing them.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.6
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889528#comment-15889528
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


[~dvdreddy]
There is a compilation error. Can you check it. There is no ref count in 
branch-1.3
{code}
[ERROR] 
/home/jenkins/jenkins-slave/workspace/HBase-1.3-JDK7/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java:[604,17]
 error: cannot find symbol
[ERROR]   symbol:   variable refCount
{code}

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.5
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889526#comment-15889526
 ] 

Hudson commented on HBASE-16630:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #116 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/116/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
fa61aad9ceb48e720bf04f23e11c4bc70c208412)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.5
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889510#comment-15889510
 ] 

deepankar commented on HBASE-16630:
---

Created HBASE-17711 for tracking the addition of tests for this patch

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Fix For: 2.0.0, 1.3.1, 1.2.5
>
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889508#comment-15889508
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


Pushed to branch-1, branch-1.2 and branch-1.3 branches. Thanks for the update 
[~dvdreddy].
Will mark it as resovled now.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889489#comment-15889489
 ] 

Hadoop QA commented on HBASE-16630:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s {color} 
| {color:red} HBASE-16630 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12855290/HBASE-16630-v3-branch-1.patch
 |
| JIRA Issue | HBASE-16630 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5893/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889163#comment-15889163
 ] 

Hadoop QA commented on HBASE-16630:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} HBASE-16630 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12855255/HBASE-16630-v3-branch-1.X.patch
 |
| JIRA Issue | HBASE-16630 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5886/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.X.patch, 
> HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882960#comment-15882960
 ] 

Hudson commented on HBASE-16630:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2563 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2563/])
HBASE-16630 Handle Fragmentation in bucket cache (ramkrishna: rev 
a8fd1119ceca3e9600af9cc9e91ae755f027a0eb)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketAllocator.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-24 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882321#comment-15882321
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


Pushed to master. Thanks for the patch [~dvdreddy]. Will wait for other branch 
patches. Let me know if you are busy I can prepare on your behalf.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-24 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882300#comment-15882300
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


Ok I can commit it. 

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-24 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882284#comment-15882284
 ] 

Anoop Sam John commented on HBASE-16630:


Yep 1.2+ all branches can be the target.   HBASE-16630-v3.patch should be the 
one to commit Ram.   Will you do that?  
 Also branch-1 , branch-1.3, branch-1.2 patches can u provide [~dvdreddy]?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-23 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882011#comment-15882011
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


Which of the above patches should be committed? [~tedyu] and [~dvdreddy]?
I can see V3 version 2 patches one with 'suggest' suffix.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-23 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881995#comment-15881995
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


+1 for commit to master and branch - 1.2. 

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-23 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881840#comment-15881840
 ] 

Sean Busbey commented on HBASE-16630:
-

seems fine for branch-1.2, but doesn't seem worth blocking 1.2.5. I'll probably 
make a go at 1.2.5 RC this weekend, so there's time.

Would really like a test, but that'd be fine as a follow-on.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881825#comment-15881825
 ] 

stack commented on HBASE-16630:
---

+1 Go for it.

Should it go into hbase-1.2.5 [~anoop.hbase]? [~busbey] FYI

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-10 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861696#comment-15861696
 ] 

deepankar commented on HBASE-16630:
---

About the unit test, I was busy past weeks trying to hunt down another bug 
(will update once I confirm that) did not get a chance to look into it.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-02-10 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861029#comment-15861029
 ] 

Anoop Sam John commented on HBASE-16630:


Looks good.  Can we get this in now?  Ping [~saint@gmail.com].

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-17 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827327#comment-15827327
 ] 

chunhui shen commented on HBASE-16630:
--

[~dvdreddy]
The method of freeEntireBuckets should return the freed bytes, then add it to 
the variable 'bytesFreed' of BucketCache#freeSpace.

I argee that the new introduced method BucketCache#freeEntireBuckets will help 
to free space for fragmentation scene and no effect on no-fragmentation scene. 
So, patch is fine to me, thanks

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-17 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827291#comment-15827291
 ] 

chunhui shen commented on HBASE-16630:
--

bq. If its harmless, why commit it (smile)?
I mean it's harmless for no-fragmentation scene after applying this patch.  :)

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-17 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826660#comment-15826660
 ] 

deepankar commented on HBASE-16630:
---

Sorry I forgot about the unit test, will see If I hack get something up. About 
[~stack]'s suggestion it was somewhat orthogonal to this issue, I agree that 
the eviction could be improved via the suggestions based in HBASE-15560, but it 
would be hard to prevent the fragmentation from happening when we have buckets 
allocated to different sizes and buckets are occupied very sparsely, and these 
occupants are blocks which are accessed recently and also frequently.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826158#comment-15826158
 ] 

stack commented on HBASE-16630:
---

bq. The patch is fine and would be harmless.

If its harmless, why commit it (smile)? You think the patch will help with 
fragmentation?

There was to be a unit test added [~dvdreddy]? Too hard to add?

Thanks.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-16 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825595#comment-15825595
 ] 

chunhui shen commented on HBASE-16630:
--

[~stack]
Our change about this  which is mentioned in the previous comment has been 
included in the last patch.
The patch is fine and would be harmless.

Thanks sir.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-16 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825508#comment-15825508
 ] 

stack commented on HBASE-16630:
---

[~dvdreddy] And thanks for trying my suggestion though it sounds like it was of 
no use.

[~zjushch] Good to hear from you. You running w/ this patch or do you see this 
in your deploy? Thanks sir.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-16 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825479#comment-15825479
 ] 

stack commented on HBASE-16630:
---

[~dvdreddy] You think we should commit this last patch? Thanks.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824847#comment-15824847
 ] 

Ted Yu commented on HBASE-16630:


Planning to commit tomorrow morning if there is no more review comment.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824206#comment-15824206
 ] 

Ted Yu commented on HBASE-16630:


+1 from me as well.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-16 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824150#comment-15824150
 ] 

Ted Yu commented on HBASE-16630:


Ran TestAsyncTableBatch with patch locally and didn't reproduce test failure.

TestAsyncTableBatch doesn't use bucket cache so it was not related.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823598#comment-15823598
 ] 

Hadoop QA commented on HBASE-16630:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 21s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 122m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.client.TestAsyncTableBatch |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12829969/16630-v3-suggest.patch
 |
| JIRA Issue | HBASE-16630 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 73f97b9581af 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 4cb09a4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5274/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/5274/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5274/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5274/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



>

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2017-01-15 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823377#comment-15823377
 ] 

chunhui shen commented on HBASE-16630:
--

seems fine for the latest patch, +1

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-12-20 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766090#comment-15766090
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


[~dvdreddy]
Thanks for the info. We can review the fix again.
[~saint@gmail.com]
What do you think of [~dvdreddy] point?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-12-20 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765587#comment-15765587
 ] 

deepankar commented on HBASE-16630:
---

Sorry for the late reply, I looked into Stack's suggestion and tried to use for 
this problem, the suggestion was more about improving the LRU part of the 
eviction and I was not sure how that help in this case where the size bucket is 
entirely full. There is already LRU by the means of accessCount and even though 
that might not be fully optimal when compared to Stack's suggestion, in the 
context of this problem the result from both the strategies will be same.

I looked into Ted Yu's suggestion last N blocks but the meta data overhead to 
track for this case was too high. We tried to pinpoint the issue of re-eviction 
int he heuristic used, with some debug statements / heap dump on real data but 
we were not successful. So we have productionz-ed the attached version of patch 
on more HBase clusters and are seeing some occasional issues but nothing 
serious. 



> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
>Priority: Critical
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-12-19 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763307#comment-15763307
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


[~dvdreddy]
Do you have some time to look into this? Were you able to see Stack's 
suggestion?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-11-04 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635468#comment-15635468
 ] 

deepankar commented on HBASE-16630:
---

Yeah I am looking into stacks suggestion to counter the problem I have 
previously mentioned about re-evicted just cached blocks

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-11-03 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635307#comment-15635307
 ] 

Anoop Sam John commented on HBASE-16630:


We would like to enable BucketCache On by default for the data blocks in 2.0.  
So can we up the priority to be critical? 
[~dvdreddy] u still working on this?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-11-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626247#comment-15626247
 ] 

stack commented on HBASE-16630:
---

[~dvdreddy] Could you use the 'smarter' algo from HBASE-15560 TinyLFU-based 
BlockCache

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-11-01 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626001#comment-15626001
 ] 

Ted Yu commented on HBASE-16630:


bq.  repeatedly eviciting the same blocks again 

How about keeping last N evicted blocks and trying not to re-evict them ?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-10-16 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580415#comment-15580415
 ] 

deepankar commented on HBASE-16630:
---

Sorry for the delayed reply, the bucket cache is not stuck as before atleast it 
is evicting blocks and adding on new blocks, but we are occasionally facing a 
different issues especially under peak load that this heuristic we have is 
repeatedly eviciting the same blocks again and again and that is leading very 
high load average some times. We are trying to come up with some tweaks by 
getting into consideration the total blocks allocated for a bucket also. if you 
have any other suggestions we can try them out too.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-10-09 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561095#comment-15561095
 ] 

Ted Yu commented on HBASE-16630:


Deepankar:
How was the patched version running in your cluster ?

Thanks

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-22 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514645#comment-15514645
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

Yes, my bad. Somehow missed ! 



> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, HBASE-16630-v2.patch, 
> HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514424#comment-15514424
 ] 

Ted Yu commented on HBASE-16630:


{code}
+  Set candidateBuckets = bucketAllocator.getLeastFilledBuckets(
+  inUseBuckets, completelyFreeBucketsNeeded);
...
+   * @param excludedBuckets the buckets that need to be excluded due to
+   *currently being in used
{code}
In use buckets are excluded from freeing.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, HBASE-16630-v2.patch, 
> HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-22 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514407#comment-15514407
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

[~dvdreddy]
OK, I was not right - do not see any significant memory overhead, but I have 
another concern

You collect all buckets that have active blocks (refCount != 0), then clear 
them up. This is counterintuitive. You evict Most Recently Used blocks?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, HBASE-16630-v2.patch, 
> HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-22 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514299#comment-15514299
 ] 

deepankar commented on HBASE-16630:
---

Which map are you referring to [~vrodionov] , I am constructing a couple of 
sets which are atmost 0(bucket  count).

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: 16630-v2-suggest.patch, HBASE-16630-v2.patch, 
> HBASE-16630-v3.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-22 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513348#comment-15513348
 ] 

Anoop Sam John commented on HBASE-16630:


Is there a way we can get the possible candidates of buckets for freeing in one 
go rather than 1st find the non 0 ref counted and pass that offset as excluded 
and find buckets from that and for every bucket do a contains check in 
excluded?  Which new Map u refer [~vrodionov] which is having 200+ bytes per 
entry?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630-v2.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-21 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510385#comment-15510385
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

[~dvdreddy],  memory requirements is very high. You efficiently add one more 
backingMap during defragmentation (majority of blocks will have zero refCount). 
backingMap itself takes 200+ bytes per block. For reasonably large caches, with 
10M blocks, that is 2GB+ more heap to run defragmentation. Seems quite a high 
overhead.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630-v2.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510294#comment-15510294
 ] 

Ted Yu commented on HBASE-16630:


If you have time, do you mind trying patch v2 on a (test) cluster ?

Thanks

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630-v2.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508568#comment-15508568
 ] 

Ted Yu commented on HBASE-16630:


{code}
+// Add buckets with blocks currently in use
+for (Long offset : excludedOffsets) {
+  excludedBuckets.add(getBucketIndex(offset));
{code}
Looks like you can construct excludedBuckets in freeEntireBuckets() directly.
This way Set excludedOffsets is not needed.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630-v2.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-20 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508469#comment-15508469
 ] 

deepankar commented on HBASE-16630:
---

I used it predominently for limiting the number of buckets we needed, if use a 
treeset we might still need to do the copy.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630-v2.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508425#comment-15508425
 ] 

Ted Yu commented on HBASE-16630:


{code}
+Set result = new HashSet<>(bucketCount);
+result.addAll(queue);
{code}
Result is a Set which doesn't preserve the order of items in queue.
Why is the MinMaxPriorityQueue needed ?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630-v2.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-19 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505542#comment-15505542
 ] 

chunhui shen commented on HBASE-16630:
--

1.Fix the issue that FreeSpace doesn't free anything (Required)
2.Defragmentation  (Optional):
 a. Would increment cache usage ratio in short time. Otherwise will take 
relative long time without defragmentation. 
 b. Resource overhead (Byte Copy). Should limit the rate. For example, only 
trigger one time per hour.
 c. Add unit test to ensure the correctness with defragmentation.


Just my thought.





optional

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-19 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505115#comment-15505115
 ] 

deepankar commented on HBASE-16630:
---

Yup looks like this is exactly what we want, the idea is to pick a random slab 
and evict all elements from it, but we can be a little bit more clever here and 
pick the slabs that would have least number of elements with in them so that we 
will do least number of reads to re-cache the elements from them. This idea is 
simple and could be done easily with current code structure also I think with 
small amount of code. [~vrodionov] do you think ? This will also address the 
concerns [~zjushch] raised

[~vrodionov]Thanks for the idea

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-19 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505090#comment-15505090
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

Google "slab calcification problem". This is exactly BucketCache "fragmentation 
issue". 

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-19 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504536#comment-15504536
 ] 

deepankar commented on HBASE-16630:
---

This is very good idea to make sure all the buckets have freeCount, since there 
are no free counts, one way to enforce this would be force the eviction from 
that BucketSizeInfo which would lead to eviction of hot blocks also if the 
current buckets of that BucketSize is small and also it would also lockup the 
unused and fragmented space from the other bucketSizes until a compaction or 
something else frees them up.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-19 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504518#comment-15504518
 ] 

deepankar commented on HBASE-16630:
---

Ha, nice point, I missed this, but this would still not guarantee, that we 
would freeup the buckets needed for the given BucketSize in a single run, it 
might need several runs free-up a completely free bucket and that would 
transfer into the our hot BucketSize. One other reason it might be slow is that 
bytesToFreeWithExtra will depend on the size of the currently hot BucketSize 
(for which there are no free blocks) and that is usually small and the 
transition might be slow. Another problem is that due to sparsely occupied 
buckets we might encounter keep ending up in the freeSpace method again and 
again. 

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-17 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500076#comment-15500076
 ] 

chunhui shen commented on HBASE-16630:
--

Another thing to do : 
In one BucketCache#FreeSpace round, make sure all BucketSizeInfos have free 
count. 

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-17 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500060#comment-15500060
 ] 

chunhui shen commented on HBASE-16630:
--

{code}
BucketCache#freeSpace
   if (needFreeForExtra) {
 bucketQueue.clear();
-remainingBuckets = 2;
+remainingBuckets = 3;
 
 bucketQueue.add(bucketSingle);
 bucketQueue.add(bucketMulti);
+bucketQueue.add(bucketMemory);
{code}
Seems this issue mentioned in description could be solved  by above code.
Just FYI


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-14 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490281#comment-15490281
 ] 

Anoop Sam John commented on HBASE-16630:


Ya that assert was just added as we always passed HBB at that time.  Now we can 
add functions to write a ByteBuff directly onto BBArray..  

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-14 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489646#comment-15489646
 ] 

ramkrishna.s.vasudevan commented on HBASE-16630:


Nice find. Some simiilar issue was reported with file mode also I believe where 
it was said that it is not used efficiently. Need to check that JIRA too. I 
don't have that ID now.
On the patch
-> The read lock is tried to be released when you get write lock.
-> relocatedCount++; - better to increment after the task is done.
-> 
{code}
backingMap.put(key, new BucketEntry(newOffset, len, bucketEntry.accessCounter,
697   bucketEntry.getPriority()));
{code}
Should the key be removed once we get the write lock or is it ok to overwrite 
the key with the new value? Am asking in terms of some other request asking for 
this key at the same time when the deFragmentation happens.
{code}
 public void setTo(long free, long used, long itemSize,
557 float nonZeroOccupancyRatio) {
{code}
When does this exact update happen?

Overall is it better if we improve the way the buckets are allocated - will 
that improve things?  
Also what is the impact of this deFragmentation in real read load. Because we 
iterate thro every key. Is it better if do this in a seperate thread where we 
hold on to the highest fragmented bucked in a queue and keep defragmenting 
that? But  may be it won't work because eviction is random and not in our hands?


> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489482#comment-15489482
 ] 

deepankar commented on HBASE-16630:
---

It is moving data, but the storage of the data (i.e byte buffers ) are 
pre-allocted at the beginning of the HRegionServer so it is more like a copy of 
the data from one to another location with out any new allocation. The old 
allocation will go on to a free list to be reused for another block of data. 

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489459#comment-15489459
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

{quote}
Defragmentation is the software one, underneath everything is backed by a large 
array of byteBuffers we are not allocating / deallocating anything we are just 
overwriting / releasing from the java level meta data
{quote}

Interesting. I have always thought that defragmentation == moving data. May be 
I was wrong?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489296#comment-15489296
 ] 

deepankar commented on HBASE-16630:
---

Commented on the comment above, SIGSEV should not be an issue as we are not 
actually deallocating memory and also we don't relocate the blocks for which 
there is an existing read that is going on (we use the check on the refCount).

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489291#comment-15489291
 ] 

deepankar commented on HBASE-16630:
---

The only problem is currently some IOEngines (Ex:- ByteBufferIOEngine) has a 
hard requirement that the bytebuffer passed to them for writing needs to be 
backed by an array, this is the sole reason for this copy, if we can remove 
that we can skip the copy. I think the main reason for this I think ByteBuff 
does not have the api methods I think for copying from a ByteBuffer I will take 
a look and see if this can be avoided.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489287#comment-15489287
 ] 

deepankar commented on HBASE-16630:
---

This is correct, Defragmentation does not affect anything in the RPC response 
path, it happens in the said background WriterThread.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489282#comment-15489282
 ] 

deepankar commented on HBASE-16630:
---

I'll remove the full lock, essentially the lock is to make sure only one write 
is happening, reads are unaffected as we skip the blocks that are currently 
being used in the read by checking their refCount. SIGSEV will also not an 
issue as this Defragmentation is the software one, underneath everything is 
backed by a large array of byteBuffers we are not allocating / deallocating 
anything we are just overwriting / releasing from the java level meta data

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489253#comment-15489253
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

How about block reads during defragmentation? I think, they will be affected 
too. So, full lock for a long time? Still does not answer question how to to 
avoid SIGSEGV for clients (HFileBlock) who reads directly from off heap cache 
(it is coming in 2.0)

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread binlijin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489240#comment-15489240
 ] 

binlijin commented on HBASE-16630:
--

So it will affect flushing the RAM cache to IOEngine.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489238#comment-15489238
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

It requires full cache lock?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489235#comment-15489235
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

How about offheap read support? Does it include reading directly from off heap 
cache? If, yes, the defragmentation will result SIGSEGV.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread binlijin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489234#comment-15489234
 ] 

binlijin commented on HBASE-16630:
--

If i am not wrong, the Defragmentation is done by WriterThread, which is a 
background thread.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489225#comment-15489225
 ] 

Vladimir Rodionov commented on HBASE-16630:
---

Defragmentation is hard. (Full GC is an example). Any estimate on a running 
time for 500GB cache with compressed blocks 10Kb in average (50M total) ?

I think it should be done in a background and incrementally.

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread binlijin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489206#comment-15489206
 ] 

binlijin commented on HBASE-16630:
--

Looks like there is extra copy in deFragment, can we reduce it?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489149#comment-15489149
 ] 

Ted Yu commented on HBASE-16630:


{code}
+  float nonZeroUsedCount = 0.0f, nonZeroFreeCount = 0.0f;
{code}
The above don't have to be float, right ? nonZeroUsedCount / (nonZeroFreeCount 
+ nonZeroUsedCount) can be calculated in float.
{code}
+for (BlockCacheKey key : backingMap.keySet()) {
{code}
Use entrySet to iterate the Map.
{code}
+  // Skip the relocation if somebody is using it
+  if (bucketEntry.refCount.get() > 0 || bucketEntry.markedForEvict) {
+continue;
+  }
{code}
Consider moving the above ahead of bucketSizesForFragmentation.contains() check 
since the above check is cheaper.
{code}
+  } finally {
+lock.readLock().unlock();
{code}
Why unlocking read lock when write lock is obtained at the beginning of try 
block ?
{code}
+// We can not read here even if backingMap does contain the given key 
because its offset
{code}
Typo: read
{code}
+  } catch (IOException ioex) {
+LOG.error("Failed reading block " + key + " from bucket cache", ioex);
{code}
Do all IOE correspond to read error ?

Is it possible to add a unit test ?

Thanks

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread binlijin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489130#comment-15489130
 ] 

binlijin commented on HBASE-16630:
--

Nice finding！

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache

2016-09-13 Thread deepankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488995#comment-15488995
 ] 

deepankar commented on HBASE-16630:
---

ping [~stack], [~anoop.hbase], [~ram_krish], any suggestions ?

> Fragmentation in long running Bucket Cache
> --
>
> Key: HBASE-16630
> URL: https://issues.apache.org/jira/browse/HBASE-16630
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>Reporter: deepankar
>Assignee: deepankar
> Attachments: HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

93 matches

Mail list logo