[jira] [Comment Edited] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999407#comment-16999407 ] Shashikant Banerjee edited comment on HDFS-15012 at 12/18/19 6:12 PM: -- Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], [~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and feedback. I have committed this. The findbug issue reported is not related. was (Author: shashikant): Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], [~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and feedback. I have committed this. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Fix For: 2.8.0, 2.9.0, 3.1.0, 2.10.0, 3.2.0, 3.3.0 > > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at >
[jira] [Comment Edited] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982106#comment-16982106 ] Wei-Chiu Chuang edited comment on HDFS-15012 at 11/26/19 3:26 AM: -- [~ericlin] is my colleague. We recently received several reports regarding corrupt fsimage or edit logs that fail to apply due to snapshot related operations. They happened on the version of HDFS with HDFS-13101, which is supposed to kill snapshot bugs once and for all. However it didn't. We are actively investigating it but so far not a solution yet. [~shashikant] [~sodonnell] FYI. was (Author: jojochuang): We recently received several reports regarding corrupt fsimage or edit logs that fail to apply due to snapshot related operations. They happened on the version of HDFS with HDFS-13101, which is supposed to kill snapshot bugs once and for all. However it didn't. We are actively investigating it but so far not a solution yet. [~shashikant] [~sodonnell] FYI. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Priority: Critical > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at >