[jira] [Issue Comment Deleted] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block

2019-05-21 Thread sylvanas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sylvanas updated HBASE-8547:

Comment: was deleted

(was: Hbase-1.4.9 problem appears again)

> Fix java.lang.RuntimeException: Cached an already cached block
> --
>
> Key: HBASE-8547
> URL: https://issues.apache.org/jira/browse/HBASE-8547
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Major
> Fix For: 0.98.0, 0.94.8, 0.95.1
>
> Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, 
> hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, 
> hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, 
> hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, 
> hbase-8547_v2-trunk.patch
>
>
> In one test, one of the region servers received the following on 0.94. 
> Note HalfStoreFileReader in the stack trace. I think the root cause is that 
> after the region is split, the mid point can be in the middle of the block 
> (for store files that the mid point is not chosen from). Each half store file 
> tries to load the half block and put it in the block cache. Since IdLock is 
> instantiated per store file reader, they do not share the same IdLock 
> instance, thus does not lock against each other effectively. 
> {code}
> 2013-05-12 01:30:37,733 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer:·
> java.lang.RuntimeException: Cached an already cached block
>   at 
> org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
>   at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643)
>   at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> {code}
> I can see two possible fixes: 
>  # Allow this kind of rare cases in LruBlockCache by not throwing an 
> exception. 
>  # Move the lock instances to upper layer (possibly in CacheConfig), and let 
> half hfile readers share the same IdLock implementation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block

2019-05-21 Thread sylvanas (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844585#comment-16844585
 ] 

sylvanas commented on HBASE-8547:
-

Hbase-1.4.9 problem appears again

> Fix java.lang.RuntimeException: Cached an already cached block
> --
>
> Key: HBASE-8547
> URL: https://issues.apache.org/jira/browse/HBASE-8547
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Major
> Fix For: 0.98.0, 0.94.8, 0.95.1
>
> Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, 
> hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, 
> hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, 
> hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, 
> hbase-8547_v2-trunk.patch
>
>
> In one test, one of the region servers received the following on 0.94. 
> Note HalfStoreFileReader in the stack trace. I think the root cause is that 
> after the region is split, the mid point can be in the middle of the block 
> (for store files that the mid point is not chosen from). Each half store file 
> tries to load the half block and put it in the block cache. Since IdLock is 
> instantiated per store file reader, they do not share the same IdLock 
> instance, thus does not lock against each other effectively. 
> {code}
> 2013-05-12 01:30:37,733 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer:·
> java.lang.RuntimeException: Cached an already cached block
>   at 
> org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
>   at 
> org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643)
>   at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> {code}
> I can see two possible fixes: 
>  # Allow this kind of rare cases in LruBlockCache by not throwing an 
> exception. 
>  # Move the lock instances to upper layer (possibly in CacheConfig), and let 
> half hfile readers share the same IdLock implementation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22439) 重启hbase,特定节点有几率出现Num.Regions=0的情况

2019-05-17 Thread sylvanas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sylvanas updated HBASE-22439:
-
Labels: hbase  (was: regionserver)

> 重启hbase,特定节点有几率出现Num.Regions=0的情况
> -
>
> Key: HBASE-22439
> URL: https://issues.apache.org/jira/browse/HBASE-22439
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.9
> Environment: centos 6.9  
> zk 4.6.11
> hadoop2.7.7
> hbase 1.4.9
>Reporter: sylvanas
>Priority: Major
>  Labels: hbase
>
> 每次启动hbase,特定节点(总磁盘是其他节点50%,但是占用率基本位置在25%左右)有几率出现Num.Regions=0的现象(出现几率大于50%),regions被其他节点分摊,整个过程master、regionserver均为发现任何有价值的日志。
> 最终regtion=0所在机器输入日志如下:
> {code:java}
> //代码占位符
> ...
> 2019-05-17 10:38:29,894 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> regionserver.ReplicationSourceManager: Current list of replicators: 
> [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, 
> gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] other RSs: 
> [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, 
> gladslave3,16020,1558060700753, gladslave1,16020,1558060697694]
> 2019-05-17 10:38:29,936 INFO [SplitLogWorker-gladslave3:16020] 
> regionserver.SplitLogWorker: SplitLogWorker gladslave3,16020,1558060700753 
> starting
> 2019-05-17 10:38:29,936 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> regionserver.HeapMemoryManager: Starting HeapMemoryTuner chore.
> 2019-05-17 10:38:29,939 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> regionserver.HRegionServer: Serving as gladslave3,16020,1558060700753, 
> RpcServer on gladslave3/10.86.10.103:16020, sessionid=0x401e4be60870092
> 2019-05-17 10:38:30,467 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> quotas.RegionServerQuotaManager: Quota support disabled
> 2019-05-17 10:38:35,699 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> wal.FSHLog: WAL configuration: blocksize=128 MB, rollsize=121.60 MB, 
> prefix=gladslave3%2C16020%2C1558060700753, suffix=, 
> logDir=hdfs://haservice/hbase/WALs/gladslave3,16020,1558060700753, 
> archiveDir=hdfs://haservice/hbase/oldWALs
> 2019-05-17 10:38:37,122 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> wal.FSHLog: Slow sync cost: 369 ms, current pipeline: []
> 2019-05-17 10:38:37,123 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> wal.FSHLog: New WAL 
> /hbase/WALs/gladslave3,16020,1558060700753/gladslave3%2C16020%2C1558060700753.1558060715708
> {code}
>  
> 但是,修改日志级别为debug后,Num.Regions=0的情况再也没有发生过(重复启动测试20次为发现问题),但是,每次重启hbase,各regionserver的regions数量发生变化,并不是上一次停止时的数量。
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22439) 重启hbase,特定节点有几率出现Num.Regions=0的情况

2019-05-17 Thread sylvanas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sylvanas updated HBASE-22439:
-
Labels: regionserver  (was: )

> 重启hbase,特定节点有几率出现Num.Regions=0的情况
> -
>
> Key: HBASE-22439
> URL: https://issues.apache.org/jira/browse/HBASE-22439
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.9
> Environment: centos 6.9  
> zk 4.6.11
> hadoop2.7.7
> hbase 1.4.9
>Reporter: sylvanas
>Priority: Major
>  Labels: regionserver
>
> 每次启动hbase,特定节点(总磁盘是其他节点50%,但是占用率基本位置在25%左右)有几率出现Num.Regions=0的现象(出现几率大于50%),regions被其他节点分摊,整个过程master、regionserver均为发现任何有价值的日志。
> 最终regtion=0所在机器输入日志如下:
> {code:java}
> //代码占位符
> ...
> 2019-05-17 10:38:29,894 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> regionserver.ReplicationSourceManager: Current list of replicators: 
> [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, 
> gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] other RSs: 
> [gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, 
> gladslave3,16020,1558060700753, gladslave1,16020,1558060697694]
> 2019-05-17 10:38:29,936 INFO [SplitLogWorker-gladslave3:16020] 
> regionserver.SplitLogWorker: SplitLogWorker gladslave3,16020,1558060700753 
> starting
> 2019-05-17 10:38:29,936 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> regionserver.HeapMemoryManager: Starting HeapMemoryTuner chore.
> 2019-05-17 10:38:29,939 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> regionserver.HRegionServer: Serving as gladslave3,16020,1558060700753, 
> RpcServer on gladslave3/10.86.10.103:16020, sessionid=0x401e4be60870092
> 2019-05-17 10:38:30,467 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> quotas.RegionServerQuotaManager: Quota support disabled
> 2019-05-17 10:38:35,699 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> wal.FSHLog: WAL configuration: blocksize=128 MB, rollsize=121.60 MB, 
> prefix=gladslave3%2C16020%2C1558060700753, suffix=, 
> logDir=hdfs://haservice/hbase/WALs/gladslave3,16020,1558060700753, 
> archiveDir=hdfs://haservice/hbase/oldWALs
> 2019-05-17 10:38:37,122 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> wal.FSHLog: Slow sync cost: 369 ms, current pipeline: []
> 2019-05-17 10:38:37,123 INFO [regionserver/gladslave3/10.86.10.103:16020] 
> wal.FSHLog: New WAL 
> /hbase/WALs/gladslave3,16020,1558060700753/gladslave3%2C16020%2C1558060700753.1558060715708
> {code}
>  
> 但是,修改日志级别为debug后,Num.Regions=0的情况再也没有发生过(重复启动测试20次为发现问题),但是,每次重启hbase,各regionserver的regions数量发生变化,并不是上一次停止时的数量。
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22439) 重启hbase,特定节点有几率出现Num.Regions=0的情况

2019-05-17 Thread sylvanas (JIRA)
sylvanas created HBASE-22439:


 Summary: 重启hbase,特定节点有几率出现Num.Regions=0的情况
 Key: HBASE-22439
 URL: https://issues.apache.org/jira/browse/HBASE-22439
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.4.9
 Environment: centos 6.9  

zk 4.6.11

hadoop2.7.7

hbase 1.4.9
Reporter: sylvanas


每次启动hbase,特定节点(总磁盘是其他节点50%,但是占用率基本位置在25%左右)有几率出现Num.Regions=0的现象(出现几率大于50%),regions被其他节点分摊,整个过程master、regionserver均为发现任何有价值的日志。

最终regtion=0所在机器输入日志如下:
{code:java}
//代码占位符

...
2019-05-17 10:38:29,894 INFO [regionserver/gladslave3/10.86.10.103:16020] 
regionserver.ReplicationSourceManager: Current list of replicators: 
[gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, 
gladslave3,16020,1558060700753, gladslave1,16020,1558060697694] other RSs: 
[gladslave4,16020,1558060698511, gladslave2,16020,1558060697919, 
gladslave3,16020,1558060700753, gladslave1,16020,1558060697694]
2019-05-17 10:38:29,936 INFO [SplitLogWorker-gladslave3:16020] 
regionserver.SplitLogWorker: SplitLogWorker gladslave3,16020,1558060700753 
starting
2019-05-17 10:38:29,936 INFO [regionserver/gladslave3/10.86.10.103:16020] 
regionserver.HeapMemoryManager: Starting HeapMemoryTuner chore.
2019-05-17 10:38:29,939 INFO [regionserver/gladslave3/10.86.10.103:16020] 
regionserver.HRegionServer: Serving as gladslave3,16020,1558060700753, 
RpcServer on gladslave3/10.86.10.103:16020, sessionid=0x401e4be60870092
2019-05-17 10:38:30,467 INFO [regionserver/gladslave3/10.86.10.103:16020] 
quotas.RegionServerQuotaManager: Quota support disabled
2019-05-17 10:38:35,699 INFO [regionserver/gladslave3/10.86.10.103:16020] 
wal.FSHLog: WAL configuration: blocksize=128 MB, rollsize=121.60 MB, 
prefix=gladslave3%2C16020%2C1558060700753, suffix=, 
logDir=hdfs://haservice/hbase/WALs/gladslave3,16020,1558060700753, 
archiveDir=hdfs://haservice/hbase/oldWALs
2019-05-17 10:38:37,122 INFO [regionserver/gladslave3/10.86.10.103:16020] 
wal.FSHLog: Slow sync cost: 369 ms, current pipeline: []
2019-05-17 10:38:37,123 INFO [regionserver/gladslave3/10.86.10.103:16020] 
wal.FSHLog: New WAL 
/hbase/WALs/gladslave3,16020,1558060700753/gladslave3%2C16020%2C1558060700753.1558060715708
{code}
 

但是,修改日志级别为debug后,Num.Regions=0的情况再也没有发生过(重复启动测试20次为发现问题),但是,每次重启hbase,各regionserver的regions数量发生变化,并不是上一次停止时的数量。

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)