[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart
[ https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126740#comment-17126740 ] huhaiyang commented on HDFS-15391: -- [~ayushtkn] Thank you for reply {quote} These are two different traces, correct? You tried restarting the namenode twice, and once it failed for CLOSE_OP and other time with TRUNCATE, Correct? {quote} Yes, These are two different traces, I'll add more details later. > Due to edit log corruption, Standby NameNode could not properly load the > Ediltog log, result in abnormal exit of the service and failure to restart > --- > > Key: HDFS-15391 > URL: https://issues.apache.org/jira/browse/HDFS-15391 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.0 >Reporter: huhaiyang >Priority: Critical > > In the cluster version 3.2.0 production environment, > We found that due to edit log corruption, Standby NameNode could not > properly load the Ediltog log, result in abnormal exit of the service and > failure to restart > {noformat} > The specific scenario is that Flink writes to HDFS(replication file), and in > the case of an exception to the write file, the following operations are > performed : > 1.close file > 2.open file > 3.truncate file > 4.append file > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart
[ https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126730#comment-17126730 ] Ayush Saxena commented on HDFS-15391: - Thanx, These are two different traces, correct? You tried restarting the namenode twice, and once it failed for CLOSE_OP and other time with TRUNCATE, Correct? > Due to edit log corruption, Standby NameNode could not properly load the > Ediltog log, result in abnormal exit of the service and failure to restart > --- > > Key: HDFS-15391 > URL: https://issues.apache.org/jira/browse/HDFS-15391 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.0 >Reporter: huhaiyang >Priority: Critical > > In the cluster version 3.2.0 production environment, > We found that due to edit log corruption, Standby NameNode could not > properly load the Ediltog log, result in abnormal exit of the service and > failure to restart > > The specific scenario is that Flink writes to HDFS, and in the case of an > exception to the write file, the following operations are performed > 1. Close file > 2. Open file > 3. truncate file > 4. append file > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart
[ https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126724#comment-17126724 ] huhaiyang commented on HDFS-15391: -- Standby NameNode exception log: 2020-06-04 18:32:11,561 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=path, replication=3, mtime=1591266620287, atime=1591264800229, blockSize=134217728, blocks=[blk_11382006007_10353346830, blk_11382023760_10353365201, blk_11382041307_10353383098, blk_11382049845_10353392031, blk_11382057341_10353399899, blk_11382071544_10353415171, blk_11382080753_10354157480], permissions=dw_water:rd:rw-r--r--, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=126060943585] java.io.IOException: File is not under construction: hdfs://path at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:476) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:329) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) 2020-06-04 18:32:11,561 ERROR org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN. 020-06-04 22:28:04,025 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation TruncateOp [src=xxxpath, clientName=DFSClient_NONMAPREDUCE_-295521672_77, clientMachine=xxx, newLength=3210623016, timestamp=1591270219348, truncateBlock=blk_11382198393_10355810378, opCode=OP_TRUNCATE, txid=126074587217] java.lang.IllegalStateException: file is already under construction at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.namenode.INodeFile.toUnderConstruction(INodeFile.java:329) at org.apache.hadoop.hdfs.server.namenode.FSDirTruncateOp.prepareFileForTruncate(FSDirTruncateOp.java:222) at org.apache.hadoop.hdfs.server.namenode.FSDirTruncateOp.unprotectedTruncate(FSDirTruncateOp.java:183) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:986) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:753) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1123) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:669) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:731) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:974) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:947) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1680) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1747) 2020-06-04 22:28:04,027 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: java.lang.IllegalStateException: file is already under construction at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:268) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:753) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1123) at
[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart
[ https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126723#comment-17126723 ] Ayush Saxena commented on HDFS-15391: - Do you have backported HDFS-7663 in that? If yes, HDFS-14581 may help. Else, can you give more background or Audit Logs or anything more. > Due to edit log corruption, Standby NameNode could not properly load the > Ediltog log, result in abnormal exit of the service and failure to restart > --- > > Key: HDFS-15391 > URL: https://issues.apache.org/jira/browse/HDFS-15391 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.0 >Reporter: huhaiyang >Priority: Critical > > In the cluster version 3.2.0 production environment, > We found that due to edit log corruption, Standby NameNode could not properly > load the Ediltog log, result in abnormal exit of the service and failure to > restart > This is the exception it throws: > 2020-06-04 18:32:11,561 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation CloseOp [length=0, inodeId=0, path=path, replication=3, > mtime=1591266620287, atime=1591264800229, blockSize=134217728, > blocks=[blk_11382006007_10353346830, blk_11382023760_10353365201, > blk_11382041307_10353383098, blk_11382049845_10353392031, > blk_11382057341_10353399899, blk_11382071544_10353415171, > blk_11382080753_10354157480], permissions=dw_water:rd:rw-r--r--, > aclEntries=null, clientName=, clientMachine=, overwrite=false, > storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, > txid=126060943585] > java.io.IOException: File is not under construction: path > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:476) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:329) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2020-06-04 18:32:11,561 ERROR > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error > encountered while tailing edits. Shutting down standby NN. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org