[jira] [Issue Comment Deleted] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block
[ https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sylvanas updated HBASE-8547: Comment: was deleted (was: Hbase-1.4.9 problem appears again) > Fix java.lang.RuntimeException: Cached an already cached block > -- > > Key: HBASE-8547 > URL: https://issues.apache.org/jira/browse/HBASE-8547 > Project: HBase > Issue Type: Bug > Components: io, regionserver >Reporter: Enis Soztutar >Assignee: Enis Soztutar >Priority: Major > Fix For: 0.98.0, 0.94.8, 0.95.1 > > Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, > hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, > hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, > hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, > hbase-8547_v2-trunk.patch > > > In one test, one of the region servers received the following on 0.94. > Note HalfStoreFileReader in the stack trace. I think the root cause is that > after the region is split, the mid point can be in the middle of the block > (for store files that the mid point is not chosen from). Each half store file > tries to load the half block and put it in the block cache. Since IdLock is > instantiated per store file reader, they do not share the same IdLock > instance, thus does not lock against each other effectively. > {code} > 2013-05-12 01:30:37,733 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer:· > java.lang.RuntimeException: Cached an already cached block > at > org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) > at > org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643) > at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) > {code} > I can see two possible fixes: > # Allow this kind of rare cases in LruBlockCache by not throwing an > exception. > # Move the lock instances to upper layer (possibly in CacheConfig), and let > half hfile readers share the same IdLock implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block
[ https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844585#comment-16844585 ] sylvanas commented on HBASE-8547: - Hbase-1.4.9 problem appears again > Fix java.lang.RuntimeException: Cached an already cached block > -- > > Key: HBASE-8547 > URL: https://issues.apache.org/jira/browse/HBASE-8547 > Project: HBase > Issue Type: Bug > Components: io, regionserver >Reporter: Enis Soztutar >Assignee: Enis Soztutar >Priority: Major > Fix For: 0.98.0, 0.94.8, 0.95.1 > > Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, > hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, > hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, > hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, > hbase-8547_v2-trunk.patch > > > In one test, one of the region servers received the following on 0.94. > Note HalfStoreFileReader in the stack trace. I think the root cause is that > after the region is split, the mid point can be in the middle of the block > (for store files that the mid point is not chosen from). Each half store file > tries to load the half block and put it in the block cache. Since IdLock is > instantiated per store file reader, they do not share the same IdLock > instance, thus does not lock against each other effectively. > {code} > 2013-05-12 01:30:37,733 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer:· > java.lang.RuntimeException: Cached an already cached block > at > org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) > at > org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643) > at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) > {code} > I can see two possible fixes: > # Allow this kind of rare cases in LruBlockCache by not throwing an > exception. > # Move the lock instances to upper layer (possibly in CacheConfig), and let > half hfile readers share the same IdLock implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22439) 重启hbase,特定节点有几率出现Num.Regions=0的情况
[ https://issues.apache.org/jira/browse/HBASE-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sylvanas updated HBASE-22439: - Labels: hbase (was: regionserver) > 重启hbase,特定节点有几率出现Num.Regions=0的情况 > - > > Key: HBASE-22439 > URL: https://issues.apache.org/jira/browse/HBASE-22439 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.4.9 > Environment: centos 6.9 > zk 4.6.11 > hadoop2.7.7 > hbase 1.4.9 >Reporter: sylvanas >Priority: Major > Labels: hbase > > 每次启动hbase,特定节点(总磁盘是其他节点50%,但是占用率基本位置在25%左右)有几率出现Num.Regions=0的现象(出现几率大于50%),regions被其他节点分摊,整个过程master、regionserver均为发现任何有价值的日志。 > 最终regtion=0所在机器输入日志如下: > {code:java} > //代码占位符 > ... > 2019-05-17 10:38:29,894 INFO [regionserver/gladslave3/10.86.10.103:16020] > regionserver.ReplicationSourceManager: Current list of replicators: > [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, > gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] other RSs: > [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, > gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] > 2019-05-17 10:38:29,936 INFO [SplitLogWorker-gladslave3:16020] > regionserver.SplitLogWorker: SplitLogWorker gladslave3,16020,1558060700753 > starting > 2019-05-17 10:38:29,936 INFO [regionserver/gladslave3/10.86.10.103:16020] > regionserver.HeapMemoryManager: Starting HeapMemoryTuner chore. > 2019-05-17 10:38:29,939 INFO [regionserver/gladslave3/10.86.10.103:16020] > regionserver.HRegionServer: Serving as gladslave3,16020,1558060700753, > RpcServer on gladslave3/10.86.10.103:16020, sessionid=0x401e4be60870092 > 2019-05-17 10:38:30,467 INFO [regionserver/gladslave3/10.86.10.103:16020] > quotas.RegionServerQuotaManager: Quota support disabled > 2019-05-17 10:38:35,699 INFO [regionserver/gladslave3/10.86.10.103:16020] > wal.FSHLog: WAL configuration: blocksize=128 MB, rollsize=121.60 MB, > prefix=gladslave3%2C16020%2C1558060700753, suffix=, > logDir=hdfs://haservice/hbase/WALs/gladslave3,16020,1558060700753, > archiveDir=hdfs://haservice/hbase/oldWALs > 2019-05-17 10:38:37,122 INFO [regionserver/gladslave3/10.86.10.103:16020] > wal.FSHLog: Slow sync cost: 369 ms, current pipeline: [] > 2019-05-17 10:38:37,123 INFO [regionserver/gladslave3/10.86.10.103:16020] > wal.FSHLog: New WAL > /hbase/WALs/gladslave3,16020,1558060700753/gladslave3%2C16020%2C1558060700753.1558060715708 > {code} > > 但是,修改日志级别为debug后,Num.Regions=0的情况再也没有发生过(重复启动测试20次为发现问题),但是,每次重启hbase,各regionserver的regions数量发生变化,并不是上一次停止时的数量。 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22439) 重启hbase,特定节点有几率出现Num.Regions=0的情况
[ https://issues.apache.org/jira/browse/HBASE-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sylvanas updated HBASE-22439: - Labels: regionserver (was: ) > 重启hbase,特定节点有几率出现Num.Regions=0的情况 > - > > Key: HBASE-22439 > URL: https://issues.apache.org/jira/browse/HBASE-22439 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.4.9 > Environment: centos 6.9 > zk 4.6.11 > hadoop2.7.7 > hbase 1.4.9 >Reporter: sylvanas >Priority: Major > Labels: regionserver > > 每次启动hbase,特定节点(总磁盘是其他节点50%,但是占用率基本位置在25%左右)有几率出现Num.Regions=0的现象(出现几率大于50%),regions被其他节点分摊,整个过程master、regionserver均为发现任何有价值的日志。 > 最终regtion=0所在机器输入日志如下: > {code:java} > //代码占位符 > ... > 2019-05-17 10:38:29,894 INFO [regionserver/gladslave3/10.86.10.103:16020] > regionserver.ReplicationSourceManager: Current list of replicators: > [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, > gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] other RSs: > [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, > gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] > 2019-05-17 10:38:29,936 INFO [SplitLogWorker-gladslave3:16020] > regionserver.SplitLogWorker: SplitLogWorker gladslave3,16020,1558060700753 > starting > 2019-05-17 10:38:29,936 INFO [regionserver/gladslave3/10.86.10.103:16020] > regionserver.HeapMemoryManager: Starting HeapMemoryTuner chore. > 2019-05-17 10:38:29,939 INFO [regionserver/gladslave3/10.86.10.103:16020] > regionserver.HRegionServer: Serving as gladslave3,16020,1558060700753, > RpcServer on gladslave3/10.86.10.103:16020, sessionid=0x401e4be60870092 > 2019-05-17 10:38:30,467 INFO [regionserver/gladslave3/10.86.10.103:16020] > quotas.RegionServerQuotaManager: Quota support disabled > 2019-05-17 10:38:35,699 INFO [regionserver/gladslave3/10.86.10.103:16020] > wal.FSHLog: WAL configuration: blocksize=128 MB, rollsize=121.60 MB, > prefix=gladslave3%2C16020%2C1558060700753, suffix=, > logDir=hdfs://haservice/hbase/WALs/gladslave3,16020,1558060700753, > archiveDir=hdfs://haservice/hbase/oldWALs > 2019-05-17 10:38:37,122 INFO [regionserver/gladslave3/10.86.10.103:16020] > wal.FSHLog: Slow sync cost: 369 ms, current pipeline: [] > 2019-05-17 10:38:37,123 INFO [regionserver/gladslave3/10.86.10.103:16020] > wal.FSHLog: New WAL > /hbase/WALs/gladslave3,16020,1558060700753/gladslave3%2C16020%2C1558060700753.1558060715708 > {code} > > 但是,修改日志级别为debug后,Num.Regions=0的情况再也没有发生过(重复启动测试20次为发现问题),但是,每次重启hbase,各regionserver的regions数量发生变化,并不是上一次停止时的数量。 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22439) 重启hbase,特定节点有几率出现Num.Regions=0的情况
sylvanas created HBASE-22439: Summary: 重启hbase,特定节点有几率出现Num.Regions=0的情况 Key: HBASE-22439 URL: https://issues.apache.org/jira/browse/HBASE-22439 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.4.9 Environment: centos 6.9 zk 4.6.11 hadoop2.7.7 hbase 1.4.9 Reporter: sylvanas 每次启动hbase,特定节点(总磁盘是其他节点50%,但是占用率基本位置在25%左右)有几率出现Num.Regions=0的现象(出现几率大于50%),regions被其他节点分摊,整个过程master、regionserver均为发现任何有价值的日志。 最终regtion=0所在机器输入日志如下: {code:java} //代码占位符 ... 2019-05-17 10:38:29,894 INFO [regionserver/gladslave3/10.86.10.103:16020] regionserver.ReplicationSourceManager: Current list of replicators: [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] other RSs: [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] 2019-05-17 10:38:29,936 INFO [SplitLogWorker-gladslave3:16020] regionserver.SplitLogWorker: SplitLogWorker gladslave3,16020,1558060700753 starting 2019-05-17 10:38:29,936 INFO [regionserver/gladslave3/10.86.10.103:16020] regionserver.HeapMemoryManager: Starting HeapMemoryTuner chore. 2019-05-17 10:38:29,939 INFO [regionserver/gladslave3/10.86.10.103:16020] regionserver.HRegionServer: Serving as gladslave3,16020,1558060700753, RpcServer on gladslave3/10.86.10.103:16020, sessionid=0x401e4be60870092 2019-05-17 10:38:30,467 INFO [regionserver/gladslave3/10.86.10.103:16020] quotas.RegionServerQuotaManager: Quota support disabled 2019-05-17 10:38:35,699 INFO [regionserver/gladslave3/10.86.10.103:16020] wal.FSHLog: WAL configuration: blocksize=128 MB, rollsize=121.60 MB, prefix=gladslave3%2C16020%2C1558060700753, suffix=, logDir=hdfs://haservice/hbase/WALs/gladslave3,16020,1558060700753, archiveDir=hdfs://haservice/hbase/oldWALs 2019-05-17 10:38:37,122 INFO [regionserver/gladslave3/10.86.10.103:16020] wal.FSHLog: Slow sync cost: 369 ms, current pipeline: [] 2019-05-17 10:38:37,123 INFO [regionserver/gladslave3/10.86.10.103:16020] wal.FSHLog: New WAL /hbase/WALs/gladslave3,16020,1558060700753/gladslave3%2C16020%2C1558060700753.1558060715708 {code} 但是,修改日志级别为debug后,Num.Regions=0的情况再也没有发生过(重复启动测试20次为发现问题),但是,每次重启hbase,各regionserver的regions数量发生变化,并不是上一次停止时的数量。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)