[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928352#comment-16928352 ] wangcong commented on HDFS-10943: - we found the patch related to our problem:HDFS-13112 The root cause which result in bufcurrent is not empty , is that there is two way to log CancelDelegationTokenOp:one is by FsnameSystem 's cancelDelegationToken() method, another is by DelegationTokenSecretManager 's logExpireToken method . logExpireToken accquire FSEditLog Lock only to write buffer , but not accquire FSNameSystme Lock,this cause when roll edit realease FSEditLog Lock,logExpireToken has chance to write log to buffer. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) > at > org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > 2016-09-23 21:40:59,618 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting > QuorumOutputStream starting at txid 10562075963 > {code} > The exception is from EditsDoubleBuffer > {code} > public void close() throws IOException { > Preconditions.checkNotNull(bufCurrent); > Preconditions.checkNotNull(bufReady); > int bufSize = bufCurrent.size(); > if (bufSize != 0) { > throw new IOException("FSEditStream has " + bufSize > + " bytes still to be flushed and cannot be closed."); > } > IOUtils.cleanup(null, bufCurrent, bufReady); > bufCurrent = bufReady = null; > } > {code} > We can see that FSNamesystem.rollEditLog expects > EditsDoubleBuffer.bufCurrent to be empty. > Edits are recorded via FSEditLog$logSync, which does: > {code} >* The data is double-buffered within each edit log implementation so that >* in-memory writing can occur in parallel with the on-disk writing. >* >* Each sync occurs in three steps: >* 1. synchronized, it swaps the double buffer and sets the isSyncRunning >* flag. >* 2. unsynchronized, it flushes the data to storage >* 3. synchronized, it resets the flag and notifies anyone waiting on the >* sync. >* >* The lack of synchronization on
[jira] [Comment Edited] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926355#comment-16926355 ] wangcong edited comment on HDFS-10943 at 9/11/19 6:43 AM: -- [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. was (Author: swingcong): [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at >
[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927181#comment-16927181 ] wangcong commented on HDFS-10943: - Sorry,[~hexiaoqiao],the version we use is 2.6.0-cdh5.10.0. Through looking log of HDFS-11292,we found the problem as follow: If roll edit run normally, the log shows : 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: logSyncAll toSyncToTxId=5060982534 lastSyncedTxid=5060982511 mostRecentTxid=5060982534 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Done logSyncAll lastWrittenTxId=5060982534 lastSyncedTxid=5060982534 mostRecentTxid=5060982534 toSyncToTxId in the firstline is the txId of EndLogSegmentOp,which is the last log of editlog,is equal to lastWrittenTxId in the secondline. This shows after EndLogSegmentOp,there is nothing to write to double buffer. but if roll edit run abnormally,the log shows : 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: logSyncAll toSyncToTxId=5061382825 lastSyncedTxid=5061371306 mostRecentTxid=5061382825 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Done logSyncAll lastWrittenTxId=5061382841 lastSyncedTxid=5061382840 mostRecentTxid=5061382841 toSyncToTxId in the firstline is not equal to lastWrittenTxId in the secondline,this shows after EndLogSegmentOp,another handler writer log to double buffer. In the secondline,lastWrittenTxId is not equal to lastSyncedTxid, which shows current buf is not empty in double buffer. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) > at > org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > 2016-09-23 21:40:59,618 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting > QuorumOutputStream starting at txid 10562075963 > {code} > The exception is from EditsDoubleBuffer > {code} > public void close() throws IOException { > Preconditions.checkNotNull(bufCurrent); > Preconditions.checkNotNull(bufReady); > int bufSize = bufCurrent.size(); > if
[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926355#comment-16926355 ] wangcong commented on HDFS-10943: - [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) > at > org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > 2016-09-23 21:40:59,618 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting > QuorumOutputStream starting at txid 10562075963 > {code} > The
[jira] [Comment Edited] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926355#comment-16926355 ] wangcong edited comment on HDFS-10943 at 9/10/19 6:15 AM: -- [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. was (Author: swingcong): [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at >