[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240889#comment-13240889 ] Hudson commented on HBASE-5623: --- Integrated in HBase-TRUNK #2697 (See [https://builds.apache.org/job/HBase-TRUNK/2697/]) HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis Soztutar and LarsH) (Revision 1305556) Result = FAILURE larsh : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code}
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239213#comment-13239213 ] Hudson commented on HBASE-5623: --- Integrated in HBase-TRUNK-security #151 (See [https://builds.apache.org/job/HBase-TRUNK-security/151/]) HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis Soztutar and LarsH) (Revision 1305556) Result = FAILURE larsh : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code}
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238534#comment-13238534 ] Lars Hofhansl commented on HBASE-5623: -- @Enis: Please have a look when you get a chance. I would like to cut an RC of 0.94 today. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238664#comment-13238664 ] Lars Hofhansl commented on HBASE-5623: -- I do not want to wait with the RC any longer. @Stack or @Ted: Please have a look at the last patch (will add LOG.debug message as indicated in the last comment). Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238675#comment-13238675 ] Ted Yu commented on HBASE-5623: --- Last patch should be good to go. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238680#comment-13238680 ] stack commented on HBASE-5623: -- Is this right? {code} +Writer tempWriter; synchronized (this.updateLock) { if (this.closed) return; + tempWriter = this.writer; // guaranteed non-null } {code} When we go to use tempWriter, it may be closed. We're fine w/ that? We'll get an IOE about it being closed Yeah, add a log here instead: {code} +} catch (IOException x) { + // Ignore. + // Writer might have been closed. + // In any case, we either don't have to do anything, + // or the log will be rolled the next time. } {code} I'm +1 on commit. Its way to easy reproducing the NPE. The changes in SequenceFileLogWriter are an improvement. I'm down w/ the retry on IOE (long as the retry is logged). Good stuff. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238689#comment-13238689 ] Lars Hofhansl commented on HBASE-5623: -- Re: tempWriter. Yep, the idea is the *only* error-condition for tempWriter is that it could have been closed. I'll add LOG.debug(Log roll failed and will be retried. (This is not an error)) Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238800#comment-13238800 ] Lars Hofhansl commented on HBASE-5623: -- Thanks for the patch Enis! And thanks for the reviews Stack and Ted. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238802#comment-13238802 ] Jean-Daniel Cryans commented on HBASE-5623: --- Funny, I just saw that NPE for the first time in my testing. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238910#comment-13238910 ] Hudson commented on HBASE-5623: --- Integrated in HBase-0.94 #57 (See [https://builds.apache.org/job/HBase-0.94/57/]) HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis Soztutar and LarsH) (Revision 1305549) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code}
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238938#comment-13238938 ] Enis Soztutar commented on HBASE-5623: -- Thanks Lars for pushing this. Just as a note, I just tested the final version of the patch on a 4 node cluster with ycsb -threads 100. No problems. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238963#comment-13238963 ] Lars Hofhansl commented on HBASE-5623: -- Good to know :) Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239001#comment-13239001 ] Hudson commented on HBASE-5623: --- Integrated in HBase-0.94-security #3 (See [https://builds.apache.org/job/HBase-0.94-security/3/]) HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis Soztutar and LarsH) (Revision 1305549) Result = ABORTED larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237658#comment-13237658 ] Lars Hofhansl commented on HBASE-5623: -- Was going by the assumption that an IOException here is actually not bad. Either the writer was concurrently closed (which means it was rolled as well), or any persistent HDFS problem will detected on next write attempt. Could LOG.debug something like: Informational: Log roll failed. Will be retried. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter().
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237713#comment-13237713 ] Lars Hofhansl commented on HBASE-5623: -- Any strong opinions other than the log message? This is the last 0.94.0 issue. I think my latest proposed patch fixes the problem while not impacting maintainability. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237714#comment-13237714 ] Ted Yu commented on HBASE-5623: --- Adding LOG.debug should be fine. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237746#comment-13237746 ] Lars Hofhansl commented on HBASE-5623: -- Will wait for Enis to have a look at latest patch. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236379#comment-13236379 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519585/5623v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1272//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1272//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1272//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Attachments: 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236771#comment-13236771 ] Zhihong Yu commented on HBASE-5623: --- I looped TestLogRollingNoCluster 20 times and didn't see NPE. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Attachments: 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236853#comment-13236853 ] Lars Hofhansl commented on HBASE-5623: -- Looking at SequenceFileLogWriter, I notice that other methods on it catch Exception or even NullPointerException (see close() and sync()). If we want maximum concurrency here, we need to do the same we do for writer.sync(). I.e. try without lock held, if failing get the lock and retry, along with SequenceFileLogWriter.getLength() catching Exception and retrow as IOException. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Attachments: 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236870#comment-13236870 ] stack commented on HBASE-5623: -- Yeah, I should make the test fail on these but its kinda hard to put your fingers in there. I was seeing the NPEs running it up in eclipse. I need to amend the test so if fails outright on NPE. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Attachments: 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236889#comment-13236889 ] Enis Soztutar commented on HBASE-5623: -- @Stack, thanks for the unit test. I'll test the the unit test with and without the patch today. Normally, it is very easy to replicate this bug. I just set up a 4 node cluster with Hadoop-1.0.1, and run ycsb with -load -threads 30. @Lars Yes, it is the same with HBASE-5336. I had searched for the exception but did not find that issue. Anyway I'll close that one as duplicate. bq. If we want maximum concurrency here, we need to do the same we do for writer.sync(). I.e. try without lock held, if failing get the lock and retry, along with SequenceFileLogWriter.getLength() catching Exception and retrow as IOException. Yes we can do that, but we cannot do this double-checking for the logSyncerThread.hlogFlush(this.writer) since it is not idempotent, so in any case we should be holding the updateLock, no? From my understanding you are suggesting that we sync on the hlogFlush, and do the retry mechanism in writer.sync and getLength(), catching all the exceptions in the first try, right? Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Attachments: 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236907#comment-13236907 ] Lars Hofhansl commented on HBASE-5623: -- @Enis: For logSyncerThread.hlogFlush(this.writer) we need to do that in that method, LogSyncer is non-static, so we can use the HLog's updateLock in its methods. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Attachments: 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237145#comment-13237145 ] Lars Hofhansl commented on HBASE-5623: -- Sorry for the naming... Should've named it v3 or test or something. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237157#comment-13237157 ] stack commented on HBASE-5623: -- Seems fine. Edit your comments. Does it pass the test? Do you want to include the test? Do you think this will slow writes? Should we check for null writer here: '+ return this.writer.getLength();' This seems like a catch that is very broad: {code} + } catch (IOException io) { +// try again is lock held {code} Can we narrow it at all? Would an NPE come up here? This log message is going to freak out the chicken littles out there: '+ LOG.debug(Log roll failed, io);' Can you do some amelioration? Something like, failed this time but we'll be back soon to retry? Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock.
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237161#comment-13237161 ] Lars Hofhansl commented on HBASE-5623: -- Lemme include the test and run it a few time, and cleanup the comments. Just wanted to tease out the general strategy. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237163#comment-13237163 ] Lars Hofhansl commented on HBASE-5623: -- re: The broad exception. The worst that can happen is that we try again. I think if there is any IOException we want try again. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237166#comment-13237166 ] Lars Hofhansl commented on HBASE-5623: -- Sorry one more: checking this.writer against null won't work unless we lock out other threads that could set it to null. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237179#comment-13237179 ] Zhihong Yu commented on HBASE-5623: --- Is the following correct ? {code} +synchronized(HLog.this.updateLock) { + for (Entry e : pending) { +writer.append(e); + } +} {code} We may have appended some entries before retry. Should we remember which Entry caused exception and continue ? Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237181#comment-13237181 ] Lars Hofhansl commented on HBASE-5623: -- I just thought about and added coded that uses an iterator and remembers the last entry. But then I realized that these are all idempotent operations and adding some of them into the log twice should have no bad effect (besides creating two identical versions of a KV) Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237211#comment-13237211 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519720/5623.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1283//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1283//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1283//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237229#comment-13237229 ] Lars Hofhansl commented on HBASE-5623: -- Ran the about 300 times now, no failure. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237234#comment-13237234 ] Enis Soztutar commented on HBASE-5623: -- bq. Ran the about 300 times now, no failure. Lars, did you run the Stack's patch as it is? It does not fail also in trunk, but the problem is that the threads exit with NPE, and it gets written to the log, but the test passes. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237242#comment-13237242 ] Zhihong Yu commented on HBASE-5623: --- For patch v4: {code} +if (this.writerRef.get() != null) { + this.writerRef.get().close(); {code} Shall we save the first writerRef.get() in a variable and use it to call close() ? Stack's test isn't in patch v4. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237279#comment-13237279 ] Lars Hofhansl commented on HBASE-5623: -- We're have a bunch of conflicting patches now :) Let's take Enis' one as the authoritative one. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237292#comment-13237292 ] Zhihong Yu commented on HBASE-5623: --- For this log at line 1320: {code} + LOG.debug(Log roll failed, ex); {code} my script detects NPE in test output in iteration 3. Since the above doesn't signify failure, I changed it to: {code} if (!(ex instanceof NullPointerException)) LOG.debug(Log roll failed, ex); {code} Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237293#comment-13237293 ] Lars Hofhansl commented on HBASE-5623: -- bq. holding the updateLock does not guarantee that the other thread's writer pointer is updated to the nextWriter Are you sure about this? This seems to be the main objective of the updateLock. I don't see any spot where we change this.writer without the updateLock held. AtomicReference should not be needed (IMHO). Also not a big fan of catching NPE, in your scenario it also should not be needed (although I could be mistaken). OK... Lemme do one: I'll integrate my fixed up patch without your new test. I'll run locally for a while. If it's fine I'll post the patch here and you can poke holes in it. Sounds fair? Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code}
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237304#comment-13237304 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519736/HBASE-5623_v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1286//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1286//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1286//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237306#comment-13237306 ] Zhihong Yu commented on HBASE-5623: --- Ran Stack's test 100 times with the change mentioned @ 23/Mar/12 23:28 No failure. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237343#comment-13237343 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519746/HBASE-5623_v5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestHLog Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1288//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1288//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1288//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237379#comment-13237379 ] Lars Hofhansl commented on HBASE-5623: -- OK. Looks like I am not going to roll the RC today after all. Still 5 jiras open against 0.94.0, so Monday is fine. I might think about this a bit more, it's a bit messy now and hard to change in the future. (but I don't think we need to change this in principle). It's mostly because we want to be able do the expensive IO without the lock held. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237425#comment-13237425 ] Zhihong Yu commented on HBASE-5623: --- {code} - this.writer.sync(); + // HBASE-4387, HBASE-5623, retry with updateLock held + tempWriter = this.writer; + logSyncerThread.hlogFlush(tempWriter, pending); + tempWriter.sync(); {code} do we still need tempWriter ? The above is within synchronized (this.updateLock). {code} + if (tempWriter.getLength() this.logrollsize) { +requestLogRoll(); + } +} catch (IOException x) { {code} Is the above catch intended to handle NPE ? NPE was handled in an earlier patch. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer()
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237428#comment-13237428 ] Lars Hofhansl commented on HBASE-5623: -- @Ted: Yes. See the changes to SequenceFileLogWriter in the same patch. Basically my reasoning is to avoid leaking up NPE from SequenceFileLogWriter. close() already had an NPE check. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237429#comment-13237429 ] Lars Hofhansl commented on HBASE-5623: -- Oh and re: tempWriter, yes it is used below in getLength(), so it's updated here. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237434#comment-13237434 ] Zhihong Yu commented on HBASE-5623: --- I see. {code} +} catch (IOException x) { + // Ignore. {code} Should we verify that x.getCause() is NPE ? My first comment about tempWriter is that this.writer can be used directly since the code is within synchronized block. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237436#comment-13237436 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519779/HBASE-5623_v6-alt.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1294//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1294//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1294//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237437#comment-13237437 ] Lars Hofhansl commented on HBASE-5623: -- bq. Should we verify that x.getCause() is NPE ? I don't think so. Any IOException here warrants a retry. bq. My first comment about tempWriter is that this.writer can be used directly since the code is within synchronized block. Sure, but it doesn't matter right. Since at this point the existing tempWriter most likely bad, it needs to be updated to the current one (for the getLength() check below). This way try and retry are the same. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock.
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237439#comment-13237439 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519780/5623-v7.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1295//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1295//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1295//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237442#comment-13237442 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519781/5623-v8.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1296//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1296//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1296//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237444#comment-13237444 ] Zhihong Yu commented on HBASE-5623: --- {code} +} catch (IOException x) { + // Ignore. + // Writer might have been closed. + // In any case, we either don't have to do anything, + // or the log will be rolled the next time. } {code} The above code is in last hunk of the patch. It is not related to retry. At least we should log a short sentence in the above catch block. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236293#comment-13236293 ] Hadoop QA commented on HBASE-5623: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519570/HBASE-5623_v0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1271//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1271//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1271//console This message is automatically generated. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236294#comment-13236294 ] stack commented on HBASE-5623: -- +1 on patch. Odd we've not seen this before. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236295#comment-13236295 ] stack commented on HBASE-5623: -- How'd you make it happen Enis? Could we write a unit test to surface it? Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236322#comment-13236322 ] Lars Hofhansl commented on HBASE-5623: -- This is the same I've seen in HBASE-5336. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Attachments: HBASE-5623_v0.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira