[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart

2020-06-05 Thread huhaiyang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126740#comment-17126740
 ] 

huhaiyang commented on HDFS-15391:
--

[~ayushtkn] Thank you for reply
{quote}
   These are two different traces, correct?
   You tried restarting the namenode twice, and once it failed for CLOSE_OP and 
other time with TRUNCATE, 
   Correct?
{quote}
Yes, These are two different traces,  I'll add more details later.

> Due to edit log corruption, Standby NameNode could not properly load the 
> Ediltog log, result in abnormal exit of the service and failure to restart
> ---
>
> Key: HDFS-15391
> URL: https://issues.apache.org/jira/browse/HDFS-15391
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: huhaiyang
>Priority: Critical
>
> In the cluster version 3.2.0 production environment,
>  We found that due to edit log corruption, Standby NameNode could not 
> properly load the Ediltog log, result in abnormal exit of the service and 
> failure to restart
> {noformat}
> The specific scenario is that Flink writes to HDFS(replication file), and in 
> the case of an exception to the write file, the following operations are 
> performed :
> 1.close file
> 2.open file
> 3.truncate file
> 4.append file
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart

2020-06-05 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126730#comment-17126730
 ] 

Ayush Saxena commented on HDFS-15391:
-

Thanx,
These are two different traces, correct?
You tried restarting the namenode twice, and once it failed for CLOSE_OP and 
other time with TRUNCATE, Correct?

> Due to edit log corruption, Standby NameNode could not properly load the 
> Ediltog log, result in abnormal exit of the service and failure to restart
> ---
>
> Key: HDFS-15391
> URL: https://issues.apache.org/jira/browse/HDFS-15391
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: huhaiyang
>Priority: Critical
>
> In the cluster version 3.2.0 production environment,
>  We found that due to edit log corruption, Standby NameNode could not 
> properly load the Ediltog log, result in abnormal exit of the service and 
> failure to restart
>  
> The specific scenario is that Flink writes to HDFS, and in the case of an 
> exception to the write file, the following operations are performed
> 1. Close file
> 2. Open file 
> 3. truncate file
> 4. append file 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart

2020-06-05 Thread huhaiyang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126724#comment-17126724
 ] 

huhaiyang commented on HDFS-15391:
--

Standby NameNode  exception log:
 2020-06-04 18:32:11,561 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
on operation CloseOp [length=0, inodeId=0, path=path, replication=3, 
mtime=1591266620287, atime=1591264800229, blockSize=134217728, 
blocks=[blk_11382006007_10353346830, blk_11382023760_10353365201, 
blk_11382041307_10353383098, blk_11382049845_10353392031, 
blk_11382057341_10353399899, blk_11382071544_10353415171, 
blk_11382080753_10354157480], permissions=dw_water:rd:rw-r--r--, 
aclEntries=null, clientName=, clientMachine=, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=126060943585]
 java.io.IOException: File is not under construction: hdfs://path
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:476)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:329)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
 at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
 2020-06-04 18:32:11,561 ERROR 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
encountered while tailing edits. Shutting down standby NN.

 

020-06-04 22:28:04,025 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
on operation TruncateOp [src=xxxpath, 
clientName=DFSClient_NONMAPREDUCE_-295521672_77, clientMachine=xxx, 
newLength=3210623016, timestamp=1591270219348, 
truncateBlock=blk_11382198393_10355810378, opCode=OP_TRUNCATE, 
txid=126074587217]
 java.lang.IllegalStateException: file is already under construction
 at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.toUnderConstruction(INodeFile.java:329)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirTruncateOp.prepareFileForTruncate(FSDirTruncateOp.java:222)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirTruncateOp.unprotectedTruncate(FSDirTruncateOp.java:183)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:986)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:753)
 at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1123)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:669)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:731)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:974)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:947)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1680)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1747)
 2020-06-04 22:28:04,027 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
loading fsimage
 java.io.IOException: java.lang.IllegalStateException: file is already under 
construction
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:268)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:753)
 at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1123)
 at 

[jira] [Commented] (HDFS-15391) Due to edit log corruption, Standby NameNode could not properly load the Ediltog log, result in abnormal exit of the service and failure to restart

2020-06-05 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126723#comment-17126723
 ] 

Ayush Saxena commented on HDFS-15391:
-

Do you have backported HDFS-7663 in that?
If yes, HDFS-14581 may help. 
Else, can you give more background or Audit Logs or anything more.

> Due to edit log corruption, Standby NameNode could not properly load the 
> Ediltog log, result in abnormal exit of the service and failure to restart
> ---
>
> Key: HDFS-15391
> URL: https://issues.apache.org/jira/browse/HDFS-15391
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: huhaiyang
>Priority: Critical
>
> In the cluster version 3.2.0 production environment,
> We found that due to edit log corruption, Standby NameNode could not properly 
> load the Ediltog log, result in abnormal exit of the service and failure to 
> restart
> This is the exception it throws:
> 2020-06-04 18:32:11,561 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation CloseOp [length=0, inodeId=0, path=path, replication=3, 
> mtime=1591266620287, atime=1591264800229, blockSize=134217728, 
> blocks=[blk_11382006007_10353346830, blk_11382023760_10353365201, 
> blk_11382041307_10353383098, blk_11382049845_10353392031, 
> blk_11382057341_10353399899, blk_11382071544_10353415171, 
> blk_11382080753_10354157480], permissions=dw_water:rd:rw-r--r--, 
> aclEntries=null, clientName=, clientMachine=, overwrite=false, 
> storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, 
> txid=126060943585]
> java.io.IOException: File is not under construction: path
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:476)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:898)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:329)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2020-06-04 18:32:11,561 ERROR 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
> encountered while tailing edits. Shutting down standby NN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org