[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
[ https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057283#comment-16057283 ] deepankar commented on HBASE-16616: --- Ah got it, thanks [~tomu.tsuruhara] for pointing me to that patch, that will definitely help > Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry > -- > > Key: HBASE-16616 > URL: https://issues.apache.org/jira/browse/HBASE-16616 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 1.2.2 >Reporter: Tomu Tsuruhara >Assignee: Tomu Tsuruhara > Fix For: 2.0.0, 1.4.0 > > Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, > HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png > > > In our HBase 1.2.2 cluster, some regionserver showed too bad > "QueueCallTime_99th_percentile" exceeding 10 seconds. > Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at > that time. > {noformat} > "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 > os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000] >java.lang.Thread.State: RUNNABLE > at > java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617) > at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499) > at > java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298) > at java.lang.ThreadLocal.remove(ThreadLocal.java:222) > at > java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881) > at > com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196) > at > com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113) > at > com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81) > at > org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81) > at > org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59) > at > org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194) > at > org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > {noformat} > We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java. > {code} > 616:while (tab[h] != null) > 617:h = nextIndex(h, len); > {code} > So I hypothesized that there're too many consecutive entries in {{tab}} array > and actually I found them in the heapdump. > !ScreenShot 2016-09-09 14.17.53.png|width=50%! > Most of these entries pointed at instance of > {{org.apache.hadoop.hbase.util.Counter$1}} > which is equivarent to {{indexHolderThreadLocal}} instance-variable in the > {{Counter}} class. > Because {{RpcServer$Connection}} class creates a {{Counter}} instance > {{rpcCount}} for every connections, > it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances > in RegionServer process > when we repeat connect-and-close from client. As a result, a ThreadLocalMap > can have lots of consecutive > entires. > Usually, since each entry is a {{WeakReference}}, these entries are collected > and removed > by garbage-collector soon after connection closed. > But if connection's life-time was long enough to survive youngGC, it wouldn't > be collected until old-gen collector runs. > Furthermore, under G1GC deployment, it is possible not to be collected even > by old-gen GC(mixed GC) > if entries sit in a region which doesn't have much garbages. > Actually we used G1GC when we encountered this problem. > We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove > explicitly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
[ https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056554#comment-16056554 ] deepankar commented on HBASE-16616: --- Hi, we have recently encountered this issue in our production and I was going through the patch, I am not sure this patch solves this issue fully, the doubt is mainly due to the fact the Counter#destroy() will only destroy the ThreadLocal in the ThreadLocalMap of the thread closing the connection, while keeping this Counter's ThreadLocal object still alive for the rest of handler threads. Am I missing something in this logic ? cc [~stack] [~tomu.tsuruhara] > Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry > -- > > Key: HBASE-16616 > URL: https://issues.apache.org/jira/browse/HBASE-16616 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 1.2.2 >Reporter: Tomu Tsuruhara >Assignee: Tomu Tsuruhara > Fix For: 2.0.0, 1.4.0 > > Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, > HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png > > > In our HBase 1.2.2 cluster, some regionserver showed too bad > "QueueCallTime_99th_percentile" exceeding 10 seconds. > Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at > that time. > {noformat} > "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 > os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000] >java.lang.Thread.State: RUNNABLE > at > java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617) > at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499) > at > java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298) > at java.lang.ThreadLocal.remove(ThreadLocal.java:222) > at > java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881) > at > com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196) > at > com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113) > at > com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81) > at > org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81) > at > org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59) > at > org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194) > at > org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > {noformat} > We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java. > {code} > 616:while (tab[h] != null) > 617:h = nextIndex(h, len); > {code} > So I hypothesized that there're too many consecutive entries in {{tab}} array > and actually I found them in the heapdump. > !ScreenShot 2016-09-09 14.17.53.png|width=50%! > Most of these entries pointed at instance of > {{org.apache.hadoop.hbase.util.Counter$1}} > which is equivarent to {{indexHolderThreadLocal}} instance-variable in the > {{Counter}} class. > Because {{RpcServer$Connection}} class creates a {{Counter}} instance > {{rpcCount}} for every connections, > it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances > in RegionServer process > when we repeat connect-and-close from client. As a result, a ThreadLocalMap > can have lots of consecutive > entires. > Usually, since each entry is a {{WeakReference}}, these entries are collected > and removed > by garbage-collector soon after connection closed. > But if connection's life-time was long enough to survive youngGC, it wouldn't > be collected until old-gen collector runs. > Furthermore, under G1GC deployment, it is possible not to be collected even > by old-gen GC(mixed GC) > if entries sit in a region which doesn't have much garbages. > Actually we used G1GC when we encountered this problem. > We should remove the entry from ThreadLocalMap by
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896860#comment-15896860 ] deepankar commented on HBASE-16630: --- Thanks guys for pushing this and dealing with my errors and unresponsiveness > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.6 > > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, > HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, > HBASE-16630-v4-branch-1.X.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889710#comment-15889710 ] deepankar commented on HBASE-16630: --- Sorry guys I forgot to check for compilation and other minor stuff. I will make sure to do this from next time > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Fix For: 2.0.0, 1.3.1, 1.2.6 > > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, > HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, > HBASE-16630-v4-branch-1.X.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-16630: -- Attachment: HBASE-16630-v4-branch-1.X.patch Updated patch fixing the compilation issue > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Fix For: 2.0.0, 1.3.1, 1.2.6 > > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, > HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, > HBASE-16630-v4-branch-1.X.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889510#comment-15889510 ] deepankar commented on HBASE-16630: --- Created HBASE-17711 for tracking the addition of tests for this patch > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Fix For: 2.0.0, 1.3.1, 1.2.5 > > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, > HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17711) Add test for Bucket Cache fragmentation fix
[ https://issues.apache.org/jira/browse/HBASE-17711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-17711: -- Summary: Add test for Bucket Cache fragmentation fix (was: Add test for HBASE-16630 fix) > Add test for Bucket Cache fragmentation fix > --- > > Key: HBASE-17711 > URL: https://issues.apache.org/jira/browse/HBASE-17711 > Project: HBase > Issue Type: Task >Reporter: deepankar >Assignee: deepankar > > Add tests for the fix in HBASE-16630 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17711) Add test for HBASE-16630 fix
[ https://issues.apache.org/jira/browse/HBASE-17711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-17711: -- Description: Add tests for the fix in HBASE-16630 > Add test for HBASE-16630 fix > > > Key: HBASE-17711 > URL: https://issues.apache.org/jira/browse/HBASE-17711 > Project: HBase > Issue Type: Task >Reporter: deepankar >Assignee: deepankar > > Add tests for the fix in HBASE-16630 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17711) Add test for HBASE-16630 fix
deepankar created HBASE-17711: - Summary: Add test for HBASE-16630 fix Key: HBASE-17711 URL: https://issues.apache.org/jira/browse/HBASE-17711 Project: HBase Issue Type: Task Reporter: deepankar Assignee: deepankar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889149#comment-15889149 ] deepankar edited comment on HBASE-16630 at 3/1/17 12:17 AM: Uploaded the patch [^HBASE-16630-v3-branch-1.X.patch] for branch-1, branch-1.2, branch-1.3 was (Author: dvdreddy): The patch for branch-1, branch-1.2, branch-1.3 > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.X.patch, > HBASE-16630-v3.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-16630: -- Attachment: HBASE-16630-v3-branch-1.X.patch The patch for branch-1, branch-1.2, branch-1.3 > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.X.patch, > HBASE-16630-v3.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861696#comment-15861696 ] deepankar commented on HBASE-16630: --- About the unit test, I was busy past weeks trying to hunt down another bug (will update once I confirm that) did not get a chance to look into it. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826660#comment-15826660 ] deepankar commented on HBASE-16630: --- Sorry I forgot about the unit test, will see If I hack get something up. About [~stack]'s suggestion it was somewhat orthogonal to this issue, I agree that the eviction could be improved via the suggestions based in HBASE-15560, but it would be hard to prevent the fragmentation from happening when we have buckets allocated to different sizes and buckets are occupied very sparsely, and these occupants are blocks which are accessed recently and also frequently. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765587#comment-15765587 ] deepankar commented on HBASE-16630: --- Sorry for the late reply, I looked into Stack's suggestion and tried to use for this problem, the suggestion was more about improving the LRU part of the eviction and I was not sure how that help in this case where the size bucket is entirely full. There is already LRU by the means of accessCount and even though that might not be fully optimal when compared to Stack's suggestion, in the context of this problem the result from both the strategies will be same. I looked into Ted Yu's suggestion last N blocks but the meta data overhead to track for this case was too high. We tried to pinpoint the issue of re-eviction int he heuristic used, with some debug statements / heap dump on real data but we were not successful. So we have productionz-ed the attached version of patch on more HBase clusters and are seeing some occasional issues but nothing serious. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635468#comment-15635468 ] deepankar commented on HBASE-16630: --- Yeah I am looking into stacks suggestion to counter the problem I have previously mentioned about re-evicted just cached blocks > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635468#comment-15635468 ] deepankar edited comment on HBASE-16630 at 11/4/16 6:49 AM: Yeah I am looking into stacks suggestion to counter the problem I have previously mentioned about re-eviction just cached blocks was (Author: dvdreddy): Yeah I am looking into stacks suggestion to counter the problem I have previously mentioned about re-evicted just cached blocks > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16878) Call Queue length enforced not accounting for the queue handler factor
[ https://issues.apache.org/jira/browse/HBASE-16878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-16878: -- Attachment: HBASE-16878.patch > Call Queue length enforced not accounting for the queue handler factor > -- > > Key: HBASE-16878 > URL: https://issues.apache.org/jira/browse/HBASE-16878 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 1.0.0, 0.98.4 >Reporter: deepankar >Assignee: deepankar >Priority: Minor > Attachments: HBASE-16878.patch > > > After HBASE-11355 currently we are not accounting for the handler factor when > deciding callQueue length, this leading to some pretty large queue sizes if > the handler factor is high enough. > Patch attached, change is oneline -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16878) Call Queue length enforced not accounting for the queue handler factor
deepankar created HBASE-16878: - Summary: Call Queue length enforced not accounting for the queue handler factor Key: HBASE-16878 URL: https://issues.apache.org/jira/browse/HBASE-16878 Project: HBase Issue Type: Bug Components: rpc Affects Versions: 0.98.4, 1.0.0 Reporter: deepankar Assignee: deepankar Priority: Minor After HBASE-11355 currently we are not accounting for the handler factor when deciding callQueue length, this leading to some pretty large queue sizes if the handler factor is high enough. Patch attached, change is oneline -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580415#comment-15580415 ] deepankar commented on HBASE-16630: --- Sorry for the delayed reply, the bucket cache is not stuck as before atleast it is evicting blocks and adding on new blocks, but we are occasionally facing a different issues especially under peak load that this heuristic we have is repeatedly eviciting the same blocks again and again and that is leading very high load average some times. We are trying to come up with some tweaks by getting into consideration the total blocks allocated for a bucket also. if you have any other suggestions we can try them out too. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630-v2.patch, HBASE-16630-v3.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514299#comment-15514299 ] deepankar commented on HBASE-16630: --- Which map are you referring to [~vrodionov] , I am constructing a couple of sets which are atmost 0(bucket count). > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: 16630-v2-suggest.patch, HBASE-16630-v2.patch, > HBASE-16630-v3.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-16630: -- Attachment: HBASE-16630-v3.patch Sorry for delay in adding the suggestion [~tedyu], I attached a patch now which in addition to your suggestions contains a couple of import fixes. Also I tested the patch on couple of machines, every thing looked fine. we are doing a cluster wide deploy today, will report on that results > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: 16630-v2-suggest.patch, HBASE-16630-v2.patch, > HBASE-16630-v3.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508469#comment-15508469 ] deepankar commented on HBASE-16630: --- I used it predominently for limiting the number of buckets we needed, if use a treeset we might still need to do the copy. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630-v2.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-16630: -- Attachment: HBASE-16630-v2.patch Attached v2, this patch changes the logic of defragmentation to the one suggested [~vrodionov] and the one used by MemCached where they just choose some slabs and evict them completely the code turned out to be much cleaner and smaller. The only heuristic is that we avoid the buckets that are the only buckets for a BucketSizeInfo and the ones that have blocks that are currently in use and order them by their occupancy ratio. Also added the change suggested [~zjushch] to add the memory type to the eviction. If this logic looks fine to the community I will go ahead and add a unit test to the freeEntireBlocks method > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630-v2.patch, HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505115#comment-15505115 ] deepankar edited comment on HBASE-16630 at 9/20/16 12:17 AM: - Yup looks like this is exactly what we want, the idea is to pick a random slab and evict all elements from it, but we can be a little bit more clever here and pick the slabs that would have least number of elements with in them so that we will do least number of reads to re-cache the elements from them. This idea is simple and could be done easily with current code structure also I think with small amount of code. [~vrodionov] what do you think about this heuristic ? I think this will also address the concerns [~zjushch] raised [~vrodionov]Thanks for the idea was (Author: dvdreddy): Yup looks like this is exactly what we want, the idea is to pick a random slab and evict all elements from it, but we can be a little bit more clever here and pick the slabs that would have least number of elements with in them so that we will do least number of reads to re-cache the elements from them. This idea is simple and could be done easily with current code structure also I think with small amount of code. [~vrodionov] do you think ? This will also address the concerns [~zjushch] raised [~vrodionov]Thanks for the idea > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505115#comment-15505115 ] deepankar commented on HBASE-16630: --- Yup looks like this is exactly what we want, the idea is to pick a random slab and evict all elements from it, but we can be a little bit more clever here and pick the slabs that would have least number of elements with in them so that we will do least number of reads to re-cache the elements from them. This idea is simple and could be done easily with current code structure also I think with small amount of code. [~vrodionov] do you think ? This will also address the concerns [~zjushch] raised [~vrodionov]Thanks for the idea > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504536#comment-15504536 ] deepankar commented on HBASE-16630: --- This is very good idea to make sure all the buckets have freeCount, since there are no free counts, one way to enforce this would be force the eviction from that BucketSizeInfo which would lead to eviction of hot blocks also if the current buckets of that BucketSize is small and also it would also lockup the unused and fragmented space from the other bucketSizes until a compaction or something else frees them up. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504518#comment-15504518 ] deepankar commented on HBASE-16630: --- Ha, nice point, I missed this, but this would still not guarantee, that we would freeup the buckets needed for the given BucketSize in a single run, it might need several runs free-up a completely free bucket and that would transfer into the our hot BucketSize. One other reason it might be slow is that bytesToFreeWithExtra will depend on the size of the currently hot BucketSize (for which there are no free blocks) and that is usually small and the transition might be slow. Another problem is that due to sparsely occupied buckets we might encounter keep ending up in the freeSpace method again and again. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489482#comment-15489482 ] deepankar commented on HBASE-16630: --- It is moving data, but the storage of the data (i.e byte buffers ) are pre-allocted at the beginning of the HRegionServer so it is more like a copy of the data from one to another location with out any new allocation. The old allocation will go on to a free list to be reused for another block of data. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489296#comment-15489296 ] deepankar commented on HBASE-16630: --- Commented on the comment above, SIGSEV should not be an issue as we are not actually deallocating memory and also we don't relocate the blocks for which there is an existing read that is going on (we use the check on the refCount). > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489291#comment-15489291 ] deepankar commented on HBASE-16630: --- The only problem is currently some IOEngines (Ex:- ByteBufferIOEngine) has a hard requirement that the bytebuffer passed to them for writing needs to be backed by an array, this is the sole reason for this copy, if we can remove that we can skip the copy. I think the main reason for this I think ByteBuff does not have the api methods I think for copying from a ByteBuffer I will take a look and see if this can be avoided. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489287#comment-15489287 ] deepankar commented on HBASE-16630: --- This is correct, Defragmentation does not affect anything in the RPC response path, it happens in the said background WriterThread. > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489282#comment-15489282 ] deepankar commented on HBASE-16630: --- I'll remove the full lock, essentially the lock is to make sure only one write is happening, reads are unaffected as we skip the blocks that are currently being used in the read by checking their refCount. SIGSEV will also not an issue as this Defragmentation is the software one, underneath everything is backed by a large array of byteBuffers we are not allocating / deallocating anything we are just overwriting / releasing from the java level meta data > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488995#comment-15488995 ] deepankar commented on HBASE-16630: --- ping [~stack], [~anoop.hbase], [~ram_krish], any suggestions ? > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16630) Fragmentation in long running Bucket Cache
[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-16630: -- Attachment: HBASE-16630.patch > Fragmentation in long running Bucket Cache > -- > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-16630.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16630) Fragmentation in long running Bucket Cache
deepankar created HBASE-16630: - Summary: Fragmentation in long running Bucket Cache Key: HBASE-16630 URL: https://issues.apache.org/jira/browse/HBASE-16630 Project: HBase Issue Type: Bug Components: BucketCache Affects Versions: 1.2.3, 1.1.6, 2.0.0, 1.3.1 Reporter: deepankar Assignee: deepankar As we are running bucket cache for a long time in our system, we are observing cases where some nodes after some time does not fully utilize the bucket cache, in some cases it is even worse in the sense they get stuck at a value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables are configured in-memory for simplicity sake). We took a heap dump and analyzed what is happening and saw that is classic case of fragmentation, current implementation of BucketCache (mainly BucketAllocator) relies on the logic that fullyFreeBuckets are available for switching/adjusting cache usage between different bucketSizes . But once a compaction / bulkload happens and the blocks are evicted from a bucket size , these are usually evicted from random places of the buckets of a bucketSize and thus locking the number of buckets associated with a bucketSize and in the worst case of the fragmentation we have seen some bucketSizes with occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to share with the other bucketSize. Currently the existing eviction logic helps in the cases where cache used is more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also done, the eviction (freeSpace function) will not evict anything and the cache utilization will be stuck at that value without any allocations for other required sizes. The fix for this we came up with is simple that we do deFragmentation ( compaction) of the bucketSize and thus increasing the occupancy ratio and also freeing up the buckets to be fullyFree, this logic itself is not complicated as the bucketAllocator takes care of packing the blocks in the buckets, we need evict and re-allocate the blocks for all the BucketSizes that dont fit the criteria. I am attaching an initial patch just to give an idea of what we are thinking and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16624) MVCC DeSerialization bug in the HFileScannerImpl
[ https://issues.apache.org/jira/browse/HBASE-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486470#comment-15486470 ] deepankar commented on HBASE-16624: --- Inorder to test we may need to refactor and isolate this function and the reason I think this issue came up was because this is a very hot method and was optimized to make sure there are unnecessary methdo redirections to make sure jit works and inlines it. > MVCC DeSerialization bug in the HFileScannerImpl > > > Key: HBASE-16624 > URL: https://issues.apache.org/jira/browse/HBASE-16624 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar >Priority: Blocker > Attachments: HBASE-16624.patch > > > My colleague [~naggarwal] found a bug in the deserialization of mvcc from > HFile, As a part of the optimization of deserialization of VLong, we read a > int at once but we forgot to convert it to unsigned one. > This would cause issues because once we cross the integer threshold in > sequenceId and a compaction happens we would write MAX_MEMSTORE_TS in the > trailer as 0 (because we will be reading negative values from the file that > got flushed with sequenceId > Integer.MAX_VALUE). And once we have > MAX_MEMSTORE_TS as 0, and there are sequenceId values present alongside with > KeyValues the regionserver will now start failing to read the compacted file > and thus corruption. > Interestingly this would happen only on the tables that don't have > DataBlockEncoding enabled and unfortunately in our case that turned out to be > META and a another small table. > Fix is small (~20 chars) and attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16624) MVCC DeSerialization bug in the HFileScannerImpl
[ https://issues.apache.org/jira/browse/HBASE-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486459#comment-15486459 ] deepankar edited comment on HBASE-16624 at 9/13/16 6:46 AM: Function is nested deep inside the HFileScannerImpl and it was hard to come up with a UT. But if you can pass the sequenceId above the Integer.MAX_VALUE you would immediately see the issue, to confirm the issue I have changed the function as below and tested it with values passed below {code} /** * Actually do the mvcc read. Does no checks. * @param offsetFromPos */ private long _readMvccVersion(ByteBuff blockBuffer) { long currMemstoreTS = 0; int offsetFromPos = 0; // This is Bytes#bytesToVint inlined so can save a few instructions in this hot method; i.e. // previous if one-byte vint, we'd redo the vint call to find int size. // Also the method is kept small so can be inlined. byte firstByte = blockBuffer.getByteAfterPosition(offsetFromPos); int len = WritableUtils.decodeVIntSize(firstByte); if (len == 1) { currMemstoreTS = firstByte; } else { int remaining = len -1; long i = 0; offsetFromPos++; if (remaining >= Bytes.SIZEOF_INT) { // The int read has to be converted to unsigned long so the & op i = (blockBuffer.getIntAfterPosition(offsetFromPos) & 0xL); remaining -= Bytes.SIZEOF_INT; offsetFromPos += Bytes.SIZEOF_INT; } if (remaining >= Bytes.SIZEOF_SHORT) { short s = blockBuffer.getShortAfterPosition(offsetFromPos); i = i << 16; i = i | (s & 0x); remaining -= Bytes.SIZEOF_SHORT; offsetFromPos += Bytes.SIZEOF_SHORT; } for (int idx = 0; idx < remaining; idx++) { byte b = blockBuffer.getByteAfterPosition(offsetFromPos + idx); i = i << 8; i = i | (b & 0xFF); } currMemstoreTS = (WritableUtils.isNegativeVInt(firstByte) ? ~i : i); } return currMemstoreTS; } {code} And I passed the byteBuff I got from {code} ByteArrayDataOutput out = ByteStreams.newDataOutput(); WritableUtils.writeVLong(out, 3085788160L); new SingleByteBuff(ByteBuffer.wrap(out.toByteArray())) {code} was (Author: dvdreddy): Function is nested deep inside the HFileScannerImpl and it was hard to come up with a UT. But if you can cross the sequenceId above the Integer.MAX_VALUE you would immediately see the issue, to confirm the issue I have changed the function {code} /** * Actually do the mvcc read. Does no checks. * @param offsetFromPos */ private long _readMvccVersion(ByteBuff blockBuffer) { long currMemstoreTS = 0; int offsetFromPos = 0; // This is Bytes#bytesToVint inlined so can save a few instructions in this hot method; i.e. // previous if one-byte vint, we'd redo the vint call to find int size. // Also the method is kept small so can be inlined. byte firstByte = blockBuffer.getByteAfterPosition(offsetFromPos); int len = WritableUtils.decodeVIntSize(firstByte); if (len == 1) { currMemstoreTS = firstByte; } else { int remaining = len -1; long i = 0; offsetFromPos++; if (remaining >= Bytes.SIZEOF_INT) { // The int read has to be converted to unsigned long so the & op i = (blockBuffer.getIntAfterPosition(offsetFromPos) & 0xL); remaining -= Bytes.SIZEOF_INT; offsetFromPos += Bytes.SIZEOF_INT; } if (remaining >= Bytes.SIZEOF_SHORT) { short s = blockBuffer.getShortAfterPosition(offsetFromPos); i = i << 16; i = i | (s & 0x); remaining -= Bytes.SIZEOF_SHORT; offsetFromPos += Bytes.SIZEOF_SHORT; } for (int idx = 0; idx < remaining; idx++) { byte b = blockBuffer.getByteAfterPosition(offsetFromPos + idx); i = i << 8; i = i | (b & 0xFF); } currMemstoreTS = (WritableUtils.isNegativeVInt(firstByte) ? ~i : i); } return currMemstoreTS; } {code} And I passed the byteBuff I got from {code} ByteArrayDataOutput out = ByteStreams.newDataOutput(); WritableUtils.writeVLong(out, 3085788160L); new SingleByteBuff(ByteBuffer.wrap(out.toByteArray())) {code} > MVCC DeSerialization bug in the HFileScannerImpl > > > Key: HBASE-16624 > URL:
[jira] [Commented] (HBASE-16624) MVCC DeSerialization bug in the HFileScannerImpl
[ https://issues.apache.org/jira/browse/HBASE-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486459#comment-15486459 ] deepankar commented on HBASE-16624: --- Function is nested deep inside the HFileScannerImpl and it was hard to come up with a UT. But if you can cross the sequenceId above the Integer.MAX_VALUE you would immediately see the issue, to confirm the issue I have changed the function {code} /** * Actually do the mvcc read. Does no checks. * @param offsetFromPos */ private long _readMvccVersion(ByteBuff blockBuffer) { long currMemstoreTS = 0; int offsetFromPos = 0; // This is Bytes#bytesToVint inlined so can save a few instructions in this hot method; i.e. // previous if one-byte vint, we'd redo the vint call to find int size. // Also the method is kept small so can be inlined. byte firstByte = blockBuffer.getByteAfterPosition(offsetFromPos); int len = WritableUtils.decodeVIntSize(firstByte); if (len == 1) { currMemstoreTS = firstByte; } else { int remaining = len -1; long i = 0; offsetFromPos++; if (remaining >= Bytes.SIZEOF_INT) { // The int read has to be converted to unsigned long so the & op i = (blockBuffer.getIntAfterPosition(offsetFromPos) & 0xL); remaining -= Bytes.SIZEOF_INT; offsetFromPos += Bytes.SIZEOF_INT; } if (remaining >= Bytes.SIZEOF_SHORT) { short s = blockBuffer.getShortAfterPosition(offsetFromPos); i = i << 16; i = i | (s & 0x); remaining -= Bytes.SIZEOF_SHORT; offsetFromPos += Bytes.SIZEOF_SHORT; } for (int idx = 0; idx < remaining; idx++) { byte b = blockBuffer.getByteAfterPosition(offsetFromPos + idx); i = i << 8; i = i | (b & 0xFF); } currMemstoreTS = (WritableUtils.isNegativeVInt(firstByte) ? ~i : i); } return currMemstoreTS; } {code} And I passed the byteBuff I got from {code} ByteArrayDataOutput out = ByteStreams.newDataOutput(); WritableUtils.writeVLong(out, 3085788160L); new SingleByteBuff(ByteBuffer.wrap(out.toByteArray())) {code} > MVCC DeSerialization bug in the HFileScannerImpl > > > Key: HBASE-16624 > URL: https://issues.apache.org/jira/browse/HBASE-16624 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar >Priority: Blocker > Attachments: HBASE-16624.patch > > > My colleague [~naggarwal] found a bug in the deserialization of mvcc from > HFile, As a part of the optimization of deserialization of VLong, we read a > int at once but we forgot to convert it to unsigned one. > This would cause issues because once we cross the integer threshold in > sequenceId and a compaction happens we would write MAX_MEMSTORE_TS in the > trailer as 0 (because we will be reading negative values from the file that > got flushed with sequenceId > Integer.MAX_VALUE). And once we have > MAX_MEMSTORE_TS as 0, and there are sequenceId values present alongside with > KeyValues the regionserver will now start failing to read the compacted file > and thus corruption. > Interestingly this would happen only on the tables that don't have > DataBlockEncoding enabled and unfortunately in our case that turned out to be > META and a another small table. > Fix is small (~20 chars) and attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15525) OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326841#comment-15326841 ] deepankar commented on HBASE-15525: --- Pulled in the latest patch and deployed on one machine, working as expected no anomalies seen. Thanks for the patch. +1 > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Sub-task > Components: IPC/RPC >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15525_V1.patch, HBASE-15525_V2.patch, > HBASE-15525_V3.patch, HBASE-15525_V4.patch, HBASE-15525_WIP.patch, WIP.patch > > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15525) OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273688#comment-15273688 ] deepankar commented on HBASE-15525: --- We pulled in this patch and ran it on one of our production systems, except couple of places where we missed returning back the buffers, the patch solves this issue in an excellent way, and only allocates the exact optimal amount of buffers needed (with limit of 4096 only allocating ~300 only) , this is the difference that we had to add to this patch to fix the minor leaks {code} diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/IPCUtil.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/IPCUtil.java index 1584a40..378459c 100644 --- a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/IPCUtil.java +++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/IPCUtil.java @@ -158,7 +158,13 @@ public class IPCUtil { assert pool != null; PoolAwareByteBuffersOutputStream pbbos = new PoolAwareByteBuffersOutputStream(pool); encodeCellsTo(pbbos, cellScanner, codec, compressor); -if (pbbos.size() == 0) return null; +if (pbbos.size() == 0) { + pbbos.closeAndPutbackBuffers(); + return null; +} return pbbos; } diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java index 517339a..93fac4e 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java @@ -1184,6 +1184,7 @@ public class RpcServer implements RpcServerInterface { if (error) { LOG.debug(getName() + call.toShortString() + ": output error -- closing"); closeConnection(call.connection); + call.done(); } } {code} Thanks [~anoop.hbase] for the excellent patch and help in debugging the above issue. > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Sub-task > Components: IPC/RPC >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15525_V1.patch, HBASE-15525_V2.patch, > HBASE-15525_WIP.patch, WIP.patch > > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262987#comment-15262987 ] deepankar commented on HBASE-15691: --- Oh ok thanks for clarifying. > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 1.3.0, 1.2.2 > > Attachments: HBASE-15691-branch-1.patch > > > HBASE-10205 was committed to trunk and 0.98 branches only. To preserve > continuity we should commit it to branch-1. The change requires more than > nontrivial fixups so I will attach a backport of the change from trunk to > current branch-1 here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262800#comment-15262800 ] deepankar commented on HBASE-15691: --- Took a look at the patch, Is there a reason why we are changing back to linked lists?, I am still seeing LinkedMaps on the master, moving back to linkedLists will probably undo the optimizations done in HBASE-14624. > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 1.3.0, 1.2.2 > > Attachments: HBASE-15691-branch-1.patch > > > HBASE-10205 was committed to trunk and 0.98 branches only. To preserve > continuity we should commit it to branch-1. The change requires more than > nontrivial fixups so I will attach a backport of the change from trunk to > current branch-1 here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10205) ConcurrentModificationException in BucketAllocator
[ https://issues.apache.org/jira/browse/HBASE-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245080#comment-15245080 ] deepankar commented on HBASE-10205: --- [~stack] and [~enis] Looks like this patch has not been committed on branch-1, is there something I am missing, or it is missed by mistake ? I checked by using the command {{git log --pretty=format:"%ad %h %s %an" --date=short | grep "HBASE-10205"}}. Thanks > ConcurrentModificationException in BucketAllocator > -- > > Key: HBASE-10205 > URL: https://issues.apache.org/jira/browse/HBASE-10205 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.89-fb >Reporter: Arjen Roodselaar >Assignee: Arjen Roodselaar >Priority: Minor > Fix For: 0.89-fb, 0.99.0, 2.0.0, 0.98.6 > > Attachments: hbase-10205-trunk.patch > > > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223789#comment-15223789 ] deepankar commented on HBASE-15437: --- Sorry for the mistake, fixed it now. bq. Its pity that we dont pass Call to this method and so we need to do get it from ThreadLoal which will have slight perf overhead. And this method is marked non private. Yeah I tried to some how get the Call to this method (I was looking all the callee's of this method actually send the params by taking them from call object), but felt like it turns out to be a bigger refactoring process and might also break some external facing pluggable APIs and then decided to take this route. > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-15437-v1.patch, HBASE-15437.patch > > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15437: -- Attachment: HBASE-15437-v1.patch > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-15437-v1.patch, HBASE-15437.patch > > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15437: -- Status: Patch Available (was: Open) > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-15437-v1.patch, HBASE-15437.patch > > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15437: -- Status: Open (was: Patch Available) > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-15437-v1.patch, HBASE-15437.patch > > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223672#comment-15223672 ] deepankar commented on HBASE-15437: --- Attached patch following suggestions from Anoop and Enis > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-15437.patch > > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15437: -- Attachment: HBASE-15437.patch > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar > Attachments: HBASE-15437.patch > > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15437: -- Assignee: deepankar Status: Patch Available (was: Open) > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-15437.patch > > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15525) OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218608#comment-15218608 ] deepankar commented on HBASE-15525: --- Sure happy to help > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Sub-task > Components: IPC/RPC >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Attachments: WIP.patch > > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does NOT count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218605#comment-15218605 ] deepankar commented on HBASE-15437: --- This means that responseTime warning will not contain responseSize and the responseSize will not contain information about responseTimes right ? > Response size calculated in RPCServer for warning tooLarge responses does NOT > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15525) OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217435#comment-15217435 ] deepankar commented on HBASE-15525: --- Oh ok, sorry for the confusion thanks > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Sub-task > Components: IPC/RPC >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Attachments: WIP.patch > > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15525) OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217417#comment-15217417 ] deepankar commented on HBASE-15525: --- bq. Sure I will correct that.. One issue is that when we have curBuf with remaining size 100 and there is a need for 110 bytes for a cell. So we will not use remaining 100 bytes from this buf but go to next BB. Will try to solve this also.. That will auto solve the one u found Will that still has the same the just acquired buffer will stay empty so will the next buffer as the buffers we are using are of fixed size ? Side note also verified from sample internal production data, there were some outliers which could trigger this (thought our data was clean enough). > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Sub-task > Components: IPC/RPC >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Attachments: WIP.patch > > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15525) OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217378#comment-15217378 ] deepankar commented on HBASE-15525: --- A minor comment on the patch for the PoolAwareByteBufferOutputStream CheckSizeAndGrow is not taking into account the extra needed, this could lead to exception if we are copying a byte array which is bigger than the size of the byte buffer we get from pool (which is possible in case of KVs as there is no limit on sizes). we should either have a loop in case we are copying byte arrays / buffers (copyMethod) so that expansion could happen multiple times in case of Larger size copy. > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Sub-task > Components: IPC/RPC >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Attachments: WIP.patch > > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
[ https://issues.apache.org/jira/browse/HBASE-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216605#comment-15216605 ] deepankar commented on HBASE-15362: --- Sure will create a jira > Compression Algorithm does not respect config params from hbase-site > > > Key: HBASE-15362 > URL: https://issues.apache.org/jira/browse/HBASE-15362 > Project: HBase > Issue Type: Sub-task >Reporter: deepankar >Assignee: deepankar >Priority: Trivial > Attachments: HBASE-15362.patch > > > Compression creates conf using new Configuration() and this leads to it not > respecting the confs set in hbase-site, fixing it is trivial using > HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15545) org.apache.hadoop.io.compress.DecompressorStream allocates too much memory
[ https://issues.apache.org/jira/browse/HBASE-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215126#comment-15215126 ] deepankar commented on HBASE-15545: --- We also had a similar issue, it also allocates a lot of memory in the RPC path also, we fixed it an hacky way, will be good if we can come up with a better fix Relevant Comment : https://issues.apache.org/jira/browse/HBASE-15362?focusedCommentId=15174292=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15174292 > org.apache.hadoop.io.compress.DecompressorStream allocates too much memory > -- > > Key: HBASE-15545 > URL: https://issues.apache.org/jira/browse/HBASE-15545 > Project: HBase > Issue Type: Sub-task > Components: Compaction >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > It accounts for ~ 11% of overall memory allocation during compaction when > compression (GZ) is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
[ https://issues.apache.org/jira/browse/HBASE-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215056#comment-15215056 ] deepankar commented on HBASE-15362: --- But this change is very specific to LZO and specialized (hacky) , is it ok include it ? We hacked it as a temporary solution. > Compression Algorithm does not respect config params from hbase-site > > > Key: HBASE-15362 > URL: https://issues.apache.org/jira/browse/HBASE-15362 > Project: HBase > Issue Type: Bug >Reporter: deepankar >Assignee: deepankar >Priority: Trivial > Attachments: HBASE-15362.patch > > > Compression creates conf using new Configuration() and this leads to it not > respecting the confs set in hbase-site, fixing it is trivial using > HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
[ https://issues.apache.org/jira/browse/HBASE-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215047#comment-15215047 ] deepankar commented on HBASE-15362: --- Depends on the default block size we have a default block size of 16 kb and this allocates a buffer of 64 kb , so in per read we are saving 64 - 16 kb of buffer I think. > Compression Algorithm does not respect config params from hbase-site > > > Key: HBASE-15362 > URL: https://issues.apache.org/jira/browse/HBASE-15362 > Project: HBase > Issue Type: Bug >Reporter: deepankar >Assignee: deepankar >Priority: Trivial > Attachments: HBASE-15362.patch > > > Compression creates conf using new Configuration() and this leads to it not > respecting the confs set in hbase-site, fixing it is trivial using > HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15525) OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15525: -- Summary: OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts (was: Fix OutOfMemory that could occur when using BoundedByteBufferPool during RPC bursts) > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209749#comment-15209749 ] deepankar commented on HBASE-13819: --- I totally agree with your idea, but I think practically it would be hard to exhaust out the offheap stuff if the allocation is proportional to the expected response size, one reason why we did not decrease the buffer size from BoundedByteBufferPool is that these large size requests are not that uncommon, they do occur occasionally and we did not want them having to allocate a full size buffer at that time with our initial assumption being with BoundedByteBufferPool there might not be any more allocations at all. May be we should try out internally by restricting max size from BBBP to a much lower value, may be its ok for large RPCs to have to create their own buffers. bq. And one more thing for GC is that the full GC only can clean the off heap area? I think young could also clear them, the problem is that Bits class when there is not enough offheap space calls the System.GC which tr iggers full GC I think but any way I agree this GC is wastefull bq. We need to make pool such that we will give a BB back if it is having a free one. When it is not having a free one and capacity is not reached, it makes a new DBB and return This would be nice, we had a hot patch internally which just fails the request when you see the Bits is going to call System.GC(), this was just temporarily to stop the RegionServer from crashing. > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch, q.png > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209739#comment-15209739 ] deepankar commented on HBASE-13819: --- I agree with you but what I feel is in the ideal scenario the increase in the heap usage should be proportional to the number of RPCs coming right ?, when we initially allocated the heap size for BBB we accounted for the anticipated burst, but the issue that came was when the size use on the server was couple of orders of magnitude more than this (the actual response waiting to respond was around 80 MB when the total heap usage was around 3.1 GB). what do you think ? > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch, q.png > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209736#comment-15209736 ] deepankar commented on HBASE-13819: --- bq. What you think of Anoop's idea of the BB being allocated onheap rather than offheap if we can't get it from the pool? The allocation would be faster... I feel both onheap / offheap allocation could suffer from the same problem as long as we are not tight in the allocation (with less wastage compared to the anticipated response), as an example we somewhat made this error rare by increasing the MaxDirectMemory to a higher value, but in the onheap case also if a user allocated his memory by tightly accounting stuff he could may as well face the issue of unnecessary GCs I think. bq. deepankar Would you mind opening a new issue describing how you would like this to work? created a jira HBASE-15525, we would definitely help in whatever way we can on this. > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch, q.png > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15525) Fix OutOfMemory that could occur when using BoundedByteBufferPool during RPC bursts
[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15525: -- Description: After HBASE-13819 the system some times run out of direct memory whenever there is some network congestion or some client side issues. This was because of pending RPCs in the RPCServer$Connection.responseQueue and since all the responses in this queue hold a buffer for cellblock from BoundedByteBufferPool this could takeup a lot of memory if the BoundedByteBufferPool's moving average settles down towards a higher value See the discussion here [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] was: After HBASE-13819 the system some times run out of direct memory whenever there is some network congestion or some client side issues. This was because of pending RPCs in the RPCServer$Connection.responseQueue and since all the responses in this queue hold a buffer for cellblock from BoundedByteBufferPool this could takeup a lot of memory if the BoundedByteBufferPool's moving average settles down towards a higher value See the discussion here https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822 > Fix OutOfMemory that could occur when using BoundedByteBufferPool during RPC > bursts > --- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15525) Fix OutOfMemory that could occur when using BoundedByteBufferPool during RPC bursts
deepankar created HBASE-15525: - Summary: Fix OutOfMemory that could occur when using BoundedByteBufferPool during RPC bursts Key: HBASE-15525 URL: https://issues.apache.org/jira/browse/HBASE-15525 Project: HBase Issue Type: Bug Components: IPC/RPC Reporter: deepankar After HBASE-13819 the system some times run out of direct memory whenever there is some network congestion or some client side issues. This was because of pending RPCs in the RPCServer$Connection.responseQueue and since all the responses in this queue hold a buffer for cellblock from BoundedByteBufferPool this could takeup a lot of memory if the BoundedByteBufferPool's moving average settles down towards a higher value See the discussion here https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209425#comment-15209425 ] deepankar commented on HBASE-13819: --- Attached image of the metric over time. we are running bucket cache of size around 69427MB and the parameters of BBBP are 2048 max entries and max size is 1MB, trace showed current moving avg is 512 kb !q.png! > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch, q.png > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-13819: -- Attachment: q.png > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch, q.png > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209414#comment-15209414 ] deepankar commented on HBASE-13819: --- bq. In description above though, talk is of Responder queue backed up which seems to say we are not clearing the server of finished responses fast enough Is it possible that the issues on client side we are not able to push the responses instead of Responder being slow ? because we do try sending the responses from the handler itself if it is possible which points somewhat the issue could be on client side ? > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209409#comment-15209409 ] deepankar commented on HBASE-13819: --- bq. Which Q we talking? The is config to put bounds on the call queue where we will reject calls... The one that reject calls only accounts for incoming request sizes and there is default 2GB limit I think. The response sizes are not accounted in this I think. The Queue is RpcServer$Connection.responseQueue. bq. Yeah, there is heuristic and we grow till we hit an average. Are we saying we grew to 512k and then afterward, all calls were 16k only? Is this a problem? In our use case our worst case response size is 512 kb (hard capped from client side) and our avg response size is between 12kb, what we observe is after 3 - 4 days of running almost always the moving avg is at 512 kb and in the heap dump all the response buffers is of size 512 kb > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209401#comment-15209401 ] deepankar commented on HBASE-13819: --- bq. Because we are not returning BB to the pool? The pool is growing w/o bound? I think there are no leaks in BB from the analysis on the heap dump, all the objects were accounted bq. We should add these if not present at TRACE level. Sorry My comment was misled, by debug statements I meant by enabling TRACE (these loggings were most useful stuff in many debugging scenarios) bq. So 2G instead of 1G? But the pool is bounded? bq. The responder is not keeping up? It is not moving stuff out of the server fast enough? bq. I am interested in leaks; how they are happening. No concrete evidence that responder is not able to keep up, but the bound in pool does not help this case because we create a new BB when one is not present in the pool and occasionally (we are observing once in 2 - 3 days) there will be spew when returns to pool grows above the configured threshold. >From the analysis we did there are no leaks bq. Where is pendingCallsQueue? The queue per connection object RpcServer$Connection.responseQueue bq. Did you observe the offheap size used growing? There s a metric IIRC. Yes we saw this in the metric (hbase.regionserver.direct.MemoryUsed) bq. Where would the fixed size be? In BBBP they eventually reach fixed size? Yes they eventually reach fixed size, but the size is that of the larger response sizes rather than median or some smaller number > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208009#comment-15208009 ] deepankar commented on HBASE-13819: --- What [~anoop.hbase] said is correct, in our case avg response sizes are around 10 -16 kb but the max size is 512kb if you run for significantly longer time the running avg goest toward 512kb. Another point its not like that there are lot of request if you have around 200 clients and lets say you have around 5 % are GCing then 10 clients with around 400 pending reqs will lead to 4000 pending requests and this leads to exhaustion of the direct memory we allocated (with buffer sizes 512 kb) but in general the overall pending response size for these 4000 requests is only 82 MB its atleast 1 - 2 orders of magnitude of space getting occupied extra. I think fixed size BBs is an excellent idea that will definitely be useful to get out of this issue. > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13819) Make RPC layer CellBlock buffer a DirectByteBuffer
[ https://issues.apache.org/jira/browse/HBASE-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207822#comment-15207822 ] deepankar commented on HBASE-13819: --- Hi, We recently pulled this patch internally and are seeing some significant side effects of BoundedByteBufferPool where the system runs out of direct memory whenever there is some network congestion or some client side issues. System props used : RPC Handlers = 300 Number of Clients ~200, Threads in each client ~20, we use asynchbase so all requests are multiplexed on single connections Initial buffer size is 16 kb Buffer size settles to 512 kb over time(from debug statements we put in) Max number to cache (reservoir.initial.max ) is around 2048 (also tried with 4096) DirectMem accounted for this (is 2048 * 512 KB + 1 GB) We took a heapdump and analysed the contents of the heap using VisualVM OQL and we found that number of rpcs that were queued in the responder was around ~4000 and this leads to exhaustion of the direct buffer space, digging a little bit more deeper the responses buffers in the pendingCallsQueue in connection accounted for 3181117440 bytes, even though the real response size (buffers are allocated for 512kb even when the response is small) accounted only for 84451734.0 bytes. [~anoop.hbase] suggested that since any way we are using buffer chain to create a CellBlock it would better to create a new ByteBufferOutputStream which acquires buffers from the pool instead of allocating a new one with very high moving average and removing the moving average overall and having a fixed size buffers instead ? Here are the visualVM query used {code} select sum(map(heap.objects('java.nio.DirectByteBuffer'), function (a) { var x = 0; var callx = null; forEachReferrer(function (y) { if (classof(y).name == 'org.apache.hadoop.hbase.ipc.RpcServer$Call') { x = -1; forEachReferrer(function (px) { if (classof(px).name == 'java.util.concurrent.ConcurrentLinkedDeque$Node') { callx = y; x = 1; } }, y); } }, a); if (a.att == null && x == 1 && callx.response.bufferOffset == 0 && callx.response.remaining != 0) { // return callx.response.remaining return a.capacity } else { return 0 } })) {code} > Make RPC layer CellBlock buffer a DirectByteBuffer > -- > > Key: HBASE-13819 > URL: https://issues.apache.org/jira/browse/HBASE-13819 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-13819.patch, HBASE-13819_branch-1.patch, > HBASE-13819_branch-1.patch, HBASE-13819_branch-1.patch > > > In RPC layer, when we make a cellBlock to put as RPC payload, we will make an > on heap byte buffer (via BoundedByteBufferPool). The pool will keep upto > certain number of buffers. This jira aims at testing possibility for making > this buffers off heap ones. (DBB) The advantages > 1. Unsafe based writes to off heap is faster than that to on heap. Now we are > not using unsafe based writes at all. Even if we add, DBB will be better > 2. When Cells are backed by off heap (HBASE-11425) off heap to off heap > writes will be better > 3. When checked the code in SocketChannel impl, if we pass a HeapByteBuffer > to the socket channel, it will create a temp DBB and copy data to there and > only DBBs will be moved to Sockets. If we make DBB 1st hand itself, we can > avoid this one more level of copying. > Will do different perf testing with changed and report back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207082#comment-15207082 ] deepankar commented on HBASE-15064: --- Yeah I think this will fix this issue. > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch, MBB_hasRemaining.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198794#comment-15198794 ] deepankar commented on HBASE-15064: --- pressed add too early i mean like this {code} public MultiByteBuff limit(int limit) { this.limit = limit; // Normally the limit will try to limit within the last BB item int limitedIndexBegin = this.itemBeginPos[this.limitedItemIndex]; if (limit > limitedIndexBegin && limit < this.itemBeginPos[this.limitedItemIndex + 1]) { this.items[this.limitedItemIndex].limit(limit - limitedIndexBegin); return this; } int itemIndex = getItemIndex(limit - 1); {code} > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198793#comment-15198793 ] deepankar commented on HBASE-15064: --- What if we remove the equality in the if clause in the limit and in the limit method modify the new limitIndex to search for limit - 1 will that not work seamlessly ? > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar reopened HBASE-15064: --- > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198752#comment-15198752 ] deepankar commented on HBASE-15064: --- Is there something wrong with my reasoning there ? or am i missing something ? > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199100#comment-15199100 ] deepankar commented on HBASE-15064: --- Yeah something like that which was what [~anoop.hbase] was suggesting, just the limitedItemIndex computation was wrong in the first place. > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch, MBB_hasRemaining.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198858#comment-15198858 ] deepankar commented on HBASE-15064: --- No I was saying, when you have done mbb1.get() for 4 times and then you check hasRemaining it will return false; so I think the following test will fail {code} bb1 = ByteBuffer.wrap(b); bb2 = ByteBuffer.wrap(b1); bb3 = ByteBuffer.allocate(4); mbb1 = new MultiByteBuff(bb1, bb2, bb3); mbb1.limit(12); for(int i = 0; i < 12; i++) { assertTrue(mbb1.hasRemaining()); mbb1.get(); } assertFalse(mbb1.hasRemaining()); {code} > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch, MBB_hasRemaining.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198821#comment-15198821 ] deepankar commented on HBASE-15064: --- I think the test in your patch should fail Mbb.limit(12); because once your cross the 4th byte the hasRemaining will start returning false as you are checking only the limit index. I think just [~anoop.hbase] 's suggestion will not work, that coupled with the modification of the way we calculate limitedItemIndex should work I think > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch, MBB_hasRemaining.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198751#comment-15198751 ] deepankar commented on HBASE-15064: --- yeah I think that will work, but what about the other bug in calculating limitedItemIndex > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198796#comment-15198796 ] deepankar commented on HBASE-15064: --- Also the hasRemaining method [~anoop.hbase] suggested should also work seamlessly similar to the generic API right ? > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15064) BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache
[ https://issues.apache.org/jira/browse/HBASE-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198473#comment-15198473 ] deepankar commented on HBASE-15064: --- I am still seeing this exception on our servers, I think I found something, what I observe is that a couple of things, in the normal byte buffers (java.nio) the hasRemaining function uses the current position and limit {code} /** * Tells whether there are any elements between the current position and * the limit. * * @return true if, and only if, there is at least one element * remaining in this buffer */ public final boolean hasRemaining() { return position < limit; } {code} But in the MultiByteBuff we have the hasRemaining is not taking care of limit {code} /** * Returns true if there are elements between the current position and the limit * @return true if there are elements, false otherwise */ @Override public final boolean hasRemaining() { return this.curItem.hasRemaining() || this.curItemIndex < this.items.length - 1; } {code} Also the items array is not changed in the limit(int) function, this means there could be a scenario where the user has asked to limit at the end of first buffer, but the hasRemaining() will still return true, Is there any flaw in my logic here ? Also in the limit(int) function in the MultiByteBuff function we are doing {code} // Normally the limit will try to limit within the last BB item int limitedIndexBegin = this.itemBeginPos[this.limitedItemIndex]; if (limit >= limitedIndexBegin && limit < this.itemBeginPos[this.limitedItemIndex + 1]) { this.items[this.limitedItemIndex].limit(limit - limitedIndexBegin); return this; } {code} here I think in the if statement isn't the logic be just {noformat} if (limit > limitedIndexBegin && limit < this.itemBeginPos[this.limitedItemIndex + 1]) {noformat} because if somebody is trying to limit at the place which is exactly at the boundary of the limitIndexBuffer then we are also including the last item which does not have any data as you are limiting at 0 (as limit == limitedIndexBegin, which is at the boundary), But then once you have read everything in the previous buffer if the client consults hasRemaining function this will return again true (as curIterm < no_of_items in array) but when you actually try to read anything we will throw BufferUnderFlowException because again the last element has no data. There is similar issue with {{getItemIndexfunction}} when again the {{elemIndex}} matches with the boundary > BufferUnderflowException after last Cell fetched from an HFile Block served > from L2 offheap cache > - > > Key: HBASE-15064 > URL: https://issues.apache.org/jira/browse/HBASE-15064 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: Anoop Sam John >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15064.patch > > > While running the newer patches on our production system, I saw this error > come couple of times > {noformat} > ipc.RpcServer: Unexpected throwable object > 2016-01-01 16:42:56,090 ERROR > [B.defaultRpcServer.handler=20,queue=20,port=60020] ipc.RpcServer: Unexpected > throwable object > java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:249) > at org.apache.hadoop.hbase.nio.MultiByteBuff.get(MultiByteBuff.java:494) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:402) > > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:517) > > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:815) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:138) > {noformat} > Looking at the get code > {code} > if (this.curItem.remaining() == 0) { > if (items.length - 1 == this.curItemIndex) { > // means cur item is the last one and we wont be able to read a long. > Throw exception > throw new BufferUnderflowException(); > } > this.curItemIndex++; > this.curItem = this.items[this.curItemIndex]; > } > return this.curItem.get(); > {code} > Can the new currentItem have zero elements (position == limit), does it make > sense to change the {{if}} to {{while}} ? {{while (this.curItem.remaining() > == 0)}}. This logic is repeated may make sense abstract to a new function if > we plan to change to {{if}} to {{while}} -- This message was sent by
[jira] [Commented] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188500#comment-15188500 ] deepankar commented on HBASE-15437: --- In that case should the values of queueTime, processingTime be stored inside the call ? or should the calculation of those be also moved to setResponse() ? > Response size calculated in RPCServer for warning tooLarge responses does > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188288#comment-15188288 ] deepankar commented on HBASE-15437: --- ping [~stack] > Response size calculated in RPCServer for warning tooLarge responses does > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does count CellScanner payload
[ https://issues.apache.org/jira/browse/HBASE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188285#comment-15188285 ] deepankar commented on HBASE-15437: --- ping [~saint@gmail.com] [~anoopsamjohn] [~ram_krish] > Response size calculated in RPCServer for warning tooLarge responses does > count CellScanner payload > --- > > Key: HBASE-15437 > URL: https://issues.apache.org/jira/browse/HBASE-15437 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: deepankar > > After HBASE-13158 where we respond back to RPCs with cells in the payload , > the protobuf response will just have the count the cells to read from > payload, but there are set of features where we log warn in RPCServer > whenever the response is tooLarge, but this size now is not considering the > sizes of the cells in the PayloadCellScanner. Code form RPCServer > {code} > long responseSize = result.getSerializedSize(); > // log any RPC responses that are slower than the configured warn > // response time or larger than configured warning size > boolean tooSlow = (processingTime > warnResponseTime && > warnResponseTime > -1); > boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > > -1); > if (tooSlow || tooLarge) { > // when tagging, we let TooLarge trump TooSmall to keep output simple > // note that large responses will often also be slow. > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > } > {code} > Should this feature be not supported any more or should we add a method to > CellScanner or a new interface which returns the serialized size (but this > might not include the compression codecs which might be used during response > ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15437) Response size calculated in RPCServer for warning tooLarge responses does count CellScanner payload
deepankar created HBASE-15437: - Summary: Response size calculated in RPCServer for warning tooLarge responses does count CellScanner payload Key: HBASE-15437 URL: https://issues.apache.org/jira/browse/HBASE-15437 Project: HBase Issue Type: Bug Components: IPC/RPC Reporter: deepankar After HBASE-13158 where we respond back to RPCs with cells in the payload , the protobuf response will just have the count the cells to read from payload, but there are set of features where we log warn in RPCServer whenever the response is tooLarge, but this size now is not considering the sizes of the cells in the PayloadCellScanner. Code form RPCServer {code} long responseSize = result.getSerializedSize(); // log any RPC responses that are slower than the configured warn // response time or larger than configured warning size boolean tooSlow = (processingTime > warnResponseTime && warnResponseTime > -1); boolean tooLarge = (responseSize > warnResponseSize && warnResponseSize > -1); if (tooSlow || tooLarge) { // when tagging, we let TooLarge trump TooSmall to keep output simple // note that large responses will often also be slow. logResponse(new Object[]{param}, md.getName(), md.getName() + "(" + param.getClass().getName() + ")", (tooLarge ? "TooLarge" : "TooSlow"), status.getClient(), startTime, processingTime, qTime, responseSize); } {code} Should this feature be not supported any more or should we add a method to CellScanner or a new interface which returns the serialized size (but this might not include the compression codecs which might be used during response ?) Any other Idea this could be fixed ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
[ https://issues.apache.org/jira/browse/HBASE-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174292#comment-15174292 ] deepankar commented on HBASE-15362: --- I was trying to set {{io.compression.codec.lzo.buffersize}}, the reason for setting this is to decrease the buffer size allocated in BlockDecompressorStream, it defaults to 64 kb and if you are missing cache frequently then this was causing some significant amount of garbage. Since you brought this up does it make sense that we should recognize that some (almost all in the current case) of the codecs does not need to create the above stream and directly send the calls to codec or the Decompressor class exposed by the codec, it will avoid this 64 kb allocations across the board and the decompressors are recycled so the buffers inside them are not a problem. Emulating the code inside the BlockDecompressorStream, the only problem is it cannot work with byte arrays, we special cased this internal and the code turned out be very small, pasting this part here {code} // Code similar to the code in BlockDecompressionStream#decompress Decompressor decompressor = null; try { decompressor = compressAlgo.getDecompressor(); int curInputOffset = inputOffset, curOutputOffset = destOffset; int originalBlockSize = Bytes.toInt(inputArray, curInputOffset, 4); int noUncompressedBytes = 0; curInputOffset += 4; // Iterate until you finish the input while (curInputOffset < (inputOffset + compressedSize) && noUncompressedBytes < uncompressedSize) { if (decompressor.needsInput()) { int blockSize = Bytes.toInt(inputArray, curInputOffset, 4); curInputOffset += 4; decompressor.setInput(inputArray, curInputOffset, blockSize); curInputOffset += blockSize; } int n = decompressor.decompress(dest, curOutputOffset, uncompressedSize - noUncompressedBytes); noUncompressedBytes += n; curOutputOffset += n; if (n == 0) { if (decompressor.finished() || decompressor.needsDictionary()) { if (noUncompressedBytes >= originalBlockSize) { throw new IOException("uncompressed bytes >= orginal size"); } } } } if (noUncompressedBytes != uncompressedSize) { throw new IOException("Premature EOF from the input bytes"); } } finally { if (decompressor != null) { compressAlgo.returnDecompressor(decompressor); } } {code} > Compression Algorithm does not respect config params from hbase-site > > > Key: HBASE-15362 > URL: https://issues.apache.org/jira/browse/HBASE-15362 > Project: HBase > Issue Type: Bug >Reporter: deepankar >Assignee: deepankar >Priority: Trivial > Attachments: HBASE-15362.patch > > > Compression creates conf using new Configuration() and this leads to it not > respecting the confs set in hbase-site, fixing it is trivial using > HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
[ https://issues.apache.org/jira/browse/HBASE-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15362: -- Attachment: HBASE-15362.patch > Compression Algorithm does not respect config params from hbase-site > > > Key: HBASE-15362 > URL: https://issues.apache.org/jira/browse/HBASE-15362 > Project: HBase > Issue Type: Bug >Reporter: deepankar >Assignee: deepankar >Priority: Trivial > Attachments: HBASE-15362.patch > > > Compression creates conf using new Configuration() and this leads to it not > respecting the confs set in hbase-site, fixing it is trivial using > HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
[ https://issues.apache.org/jira/browse/HBASE-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15362: -- Status: Patch Available (was: In Progress) > Compression Algorithm does not respect config params from hbase-site > > > Key: HBASE-15362 > URL: https://issues.apache.org/jira/browse/HBASE-15362 > Project: HBase > Issue Type: Bug >Reporter: deepankar >Assignee: deepankar >Priority: Trivial > Attachments: HBASE-15362.patch > > > Compression creates conf using new Configuration() and this leads to it not > respecting the confs set in hbase-site, fixing it is trivial using > HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
deepankar created HBASE-15362: - Summary: Compression Algorithm does not respect config params from hbase-site Key: HBASE-15362 URL: https://issues.apache.org/jira/browse/HBASE-15362 Project: HBase Issue Type: Bug Reporter: deepankar Assignee: deepankar Priority: Trivial Compression creates conf using new Configuration() and this leads to it not respecting the confs set in hbase-site, fixing it is trivial using HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-15362) Compression Algorithm does not respect config params from hbase-site
[ https://issues.apache.org/jira/browse/HBASE-15362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-15362 started by deepankar. - > Compression Algorithm does not respect config params from hbase-site > > > Key: HBASE-15362 > URL: https://issues.apache.org/jira/browse/HBASE-15362 > Project: HBase > Issue Type: Bug >Reporter: deepankar >Assignee: deepankar >Priority: Trivial > > Compression creates conf using new Configuration() and this leads to it not > respecting the confs set in hbase-site, fixing it is trivial using > HBaseConfiguration.create() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15361) Remove unnecessary or Document constraints on BucketCache possible bucket sizes
deepankar created HBASE-15361: - Summary: Remove unnecessary or Document constraints on BucketCache possible bucket sizes Key: HBASE-15361 URL: https://issues.apache.org/jira/browse/HBASE-15361 Project: HBase Issue Type: Sub-task Components: BucketCache Reporter: deepankar Priority: Minor When we were trying to tune the bucket sizes {{hbase.bucketcache.bucket.sizes}} according to our workload, we encountered an issue due to the way offset is stored in the bucket entry. We divide the offset into integer base and byte value and it assumes that all bucket offsets will be a multiple of 256 (left shifting by 8). See the code below {code} long offset() { // Java has no unsigned numbers long o = ((long) offsetBase) & 0x; o += (((long) (offset1)) & 0xFF) << 32; return o << 8; } private void setOffset(long value) { assert (value & 0xFF) == 0; value >>= 8; offsetBase = (int) value; offset1 = (byte) (value >> 32); } {code} This was there to save 3 bytes per BucketEntry instead of using long and when there are no other fields in the Bucket Entry, but now there are lot of fields in the bucket entry , This not documented so we could either document the constraint that it should be a strict 256 bytes multiple of just go away with this constraint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15361) Remove unnecessary or Document constraints on BucketCache possible bucket sizes
[ https://issues.apache.org/jira/browse/HBASE-15361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172708#comment-15172708 ] deepankar commented on HBASE-15361: --- I can put up a patch based on the direction community suggests > Remove unnecessary or Document constraints on BucketCache possible bucket > sizes > > > Key: HBASE-15361 > URL: https://issues.apache.org/jira/browse/HBASE-15361 > Project: HBase > Issue Type: Sub-task > Components: BucketCache >Reporter: deepankar >Priority: Minor > > When we were trying to tune the bucket sizes > {{hbase.bucketcache.bucket.sizes}} according to our workload, we encountered > an issue due to the way offset is stored in the bucket entry. We divide the > offset into integer base and byte value and it assumes that all bucket > offsets will be a multiple of 256 (left shifting by 8). See the code below > {code} > long offset() { // Java has no unsigned numbers > long o = ((long) offsetBase) & 0x; > o += (((long) (offset1)) & 0xFF) << 32; > return o << 8; > } > private void setOffset(long value) { > assert (value & 0xFF) == 0; > value >>= 8; > offsetBase = (int) value; > offset1 = (byte) (value >> 32); > } > {code} > This was there to save 3 bytes per BucketEntry instead of using long and when > there are no other fields in the Bucket Entry, but now there are lot of > fields in the bucket entry , This not documented so we could either document > the constraint that it should be a strict 256 bytes multiple of just go away > with this constraint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082
[ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110224#comment-15110224 ] deepankar commented on HBASE-15101: --- It is on on branch-1 only. ( But I have ported other patches also so When I pulled this in there no significant merge conflicts) > Leaked References to StoreFile.Reader after HBASE-13082 > --- > > Key: HBASE-15101 > URL: https://issues.apache.org/jira/browse/HBASE-15101 > Project: HBase > Issue Type: Bug > Components: HFile, io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, > HBASE-15101-v3.patch, HBASE-15101-v4.patch, HBASE-15101.patch > > > We observed this production that after a region server dies there are huge > number of hfiles in that region for the region server running the version > with HBASE-13082, In the doc it is given that it is expected to happen, but > we found a one place where scanners are not being closed. If the scanners are > not closed their references are not decremented and that is leading to the > issue of huge number of store files not being finalized > All I was able to find is in the selectScannersFrom, where we discard some of > the scanners and we are not closing them. I am attaching a patch for that. > Also to avoid these issues should the files that are done be logged and > finalized (moved to archive) as a part of region close operation. This will > solve any leaks that can happen and does not cause any dire consequences? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082
[ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106413#comment-15106413 ] deepankar commented on HBASE-15101: --- I thought before HBASE-13082, when a compaction starts and before it completes the files are present in .tmp directory (of the region folder) and finalized once it completes giving a very small window (after moving in the files from .tmp and moving out files from RegionServer) where there could be that all files are present. This is not the case after HBASE-13082 because both the set of files are present in the folder for a longer period of time and if there is any leak in the reference counting then all the files co exist and it can lead to a region size explosion . This is what exactly happened with us, without this patch we were running one regionserver with HBASE-13082 and almost all the regions on that server had all the files from the time of begining of that regionserver and movement of region to that server (movement rarely happens). The worst is we force major compact regions daily and that lead to the region data getting repeated over 7 times and In panic when we shutdown (gracefully) this server it lead to other regionservers that hosted these regions keep on compacting the whole next day (as each of them contained 5-7x the data of normal region). So then when applied this patch and hosted only two regions on this experimental regionserver for 2 days, and the samething repeated and when again we shutdown (again gracefully) the regionserver all the files did remain in the directory and it did lead to longer compaction next time. If we can come up with patch after leak may I could take a stab testing again, I will also go through the close() to see if I am missing any thing. Thanks > Leaked References to StoreFile.Reader after HBASE-13082 > --- > > Key: HBASE-15101 > URL: https://issues.apache.org/jira/browse/HBASE-15101 > Project: HBase > Issue Type: Bug > Components: HFile, io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar > Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, > HBASE-15101-v3.patch, HBASE-15101.patch > > > We observed this production that after a region server dies there are huge > number of hfiles in that region for the region server running the version > with HBASE-13082, In the doc it is given that it is expected to happen, but > we found a one place where scanners are not being closed. If the scanners are > not closed their references are not decremented and that is leading to the > issue of huge number of store files not being finalized > All I was able to find is in the selectScannersFrom, where we discard some of > the scanners and we are not closing them. I am attaching a patch for that. > Also to avoid these issues should the files that are done be logged and > finalized (moved to archive) as a part of region close operation. This will > solve any leaks that can happen and does not cause any dire consequences? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082
[ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deepankar updated HBASE-15101: -- Attachment: HBASE-15101-v4.patch > Leaked References to StoreFile.Reader after HBASE-13082 > --- > > Key: HBASE-15101 > URL: https://issues.apache.org/jira/browse/HBASE-15101 > Project: HBase > Issue Type: Bug > Components: HFile, io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, > HBASE-15101-v3.patch, HBASE-15101-v4.patch, HBASE-15101.patch > > > We observed this production that after a region server dies there are huge > number of hfiles in that region for the region server running the version > with HBASE-13082, In the doc it is given that it is expected to happen, but > we found a one place where scanners are not being closed. If the scanners are > not closed their references are not decremented and that is leading to the > issue of huge number of store files not being finalized > All I was able to find is in the selectScannersFrom, where we discard some of > the scanners and we are not closing them. I am attaching a patch for that. > Also to avoid these issues should the files that are done be logged and > finalized (moved to archive) as a part of region close operation. This will > solve any leaks that can happen and does not cause any dire consequences? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082
[ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108017#comment-15108017 ] deepankar commented on HBASE-15101: --- Attached patch with close calls also. > Leaked References to StoreFile.Reader after HBASE-13082 > --- > > Key: HBASE-15101 > URL: https://issues.apache.org/jira/browse/HBASE-15101 > Project: HBase > Issue Type: Bug > Components: HFile, io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, > HBASE-15101-v3.patch, HBASE-15101-v4.patch, HBASE-15101.patch > > > We observed this production that after a region server dies there are huge > number of hfiles in that region for the region server running the version > with HBASE-13082, In the doc it is given that it is expected to happen, but > we found a one place where scanners are not being closed. If the scanners are > not closed their references are not decremented and that is leading to the > issue of huge number of store files not being finalized > All I was able to find is in the selectScannersFrom, where we discard some of > the scanners and we are not closing them. I am attaching a patch for that. > Also to avoid these issues should the files that are done be logged and > finalized (moved to archive) as a part of region close operation. This will > solve any leaks that can happen and does not cause any dire consequences? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082
[ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108013#comment-15108013 ] deepankar commented on HBASE-15101: --- Should I add the close calls before return statement as I mentioned above ? > Leaked References to StoreFile.Reader after HBASE-13082 > --- > > Key: HBASE-15101 > URL: https://issues.apache.org/jira/browse/HBASE-15101 > Project: HBase > Issue Type: Bug > Components: HFile, io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, > HBASE-15101-v3.patch, HBASE-15101.patch > > > We observed this production that after a region server dies there are huge > number of hfiles in that region for the region server running the version > with HBASE-13082, In the doc it is given that it is expected to happen, but > we found a one place where scanners are not being closed. If the scanners are > not closed their references are not decremented and that is leading to the > issue of huge number of store files not being finalized > All I was able to find is in the selectScannersFrom, where we discard some of > the scanners and we are not closing them. I am attaching a patch for that. > Also to avoid these issues should the files that are done be logged and > finalized (moved to archive) as a part of region close operation. This will > solve any leaks that can happen and does not cause any dire consequences? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15101) Leaked References to StoreFile.Reader after HBASE-13082
[ https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108036#comment-15108036 ] deepankar commented on HBASE-15101: --- Adding the close did not help before, but I thought it should follow convention. I added it to all places where we are returning the NO_MORE_VALUES, is there any other place I am missing ? > Leaked References to StoreFile.Reader after HBASE-13082 > --- > > Key: HBASE-15101 > URL: https://issues.apache.org/jira/browse/HBASE-15101 > Project: HBase > Issue Type: Bug > Components: HFile, io >Affects Versions: 2.0.0 >Reporter: deepankar >Assignee: deepankar >Priority: Critical > Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, > HBASE-15101-v3.patch, HBASE-15101-v4.patch, HBASE-15101.patch > > > We observed this production that after a region server dies there are huge > number of hfiles in that region for the region server running the version > with HBASE-13082, In the doc it is given that it is expected to happen, but > we found a one place where scanners are not being closed. If the scanners are > not closed their references are not decremented and that is leading to the > issue of huge number of store files not being finalized > All I was able to find is in the selectScannersFrom, where we discard some of > the scanners and we are not closing them. I am attaching a patch for that. > Also to avoid these issues should the files that are done be logged and > finalized (moved to archive) as a part of region close operation. This will > solve any leaks that can happen and does not cause any dire consequences? -- This message was sent by Atlassian JIRA (v6.3.4#6332)