[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
[ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452990#comment-15452990 ] Kihwal Lee commented on HDFS-10729: --- They pass for me too. {noformat} --- T E S T S --- Running org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.175 sec - in org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby Running org.apache.hadoop.hdfs.TestLeaseRecovery2 Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.721 sec - in org.apache.hadoop.hdfs.TestLeaseRecovery2 Running org.apache.hadoop.hdfs.TestFileCorruption Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.311 sec - in org.apache.hadoop.hdfs.TestFileCorruption Running org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.407 sec - in org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser Results : Tests run: 19, Failures: 0, Errors: 0, Skipped: 0 {noformat} > NameNode crashes when loading edits because max directory items is exceeded > --- > > Key: HDFS-10729 > URL: https://issues.apache.org/jira/browse/HDFS-10729 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Critical > Attachments: HDFS-10729.001.patch > > > We encountered a bug where Standby NameNode crashes due to an NPE when > loading edits. > {noformat} > 2016-08-05 15:06:00,983 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, > mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], > permissions=:supergroup:rw-r--r--, aclEntries=null, > clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, > overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, > RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > {noformat} > The NameNode crashes and can not be restarted. After some research, we turned > on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we > saw the following exception which induced NPE: > {noformat} > 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedAddFile: exception when add [path] to the file system > org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException: > The directory item limit of [path] is exceeded: limit=1048576 items=1049332 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081) > at >
[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
[ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449921#comment-15449921 ] Wei-Chiu Chuang commented on HDFS-10729: All failed tests passed locally. [~kihwal] would you like to take a look again? Thanks > NameNode crashes when loading edits because max directory items is exceeded > --- > > Key: HDFS-10729 > URL: https://issues.apache.org/jira/browse/HDFS-10729 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Critical > Attachments: HDFS-10729.001.patch > > > We encountered a bug where Standby NameNode crashes due to an NPE when > loading edits. > {noformat} > 2016-08-05 15:06:00,983 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, > mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], > permissions=:supergroup:rw-r--r--, aclEntries=null, > clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, > overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, > RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > {noformat} > The NameNode crashes and can not be restarted. After some research, we turned > on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we > saw the following exception which induced NPE: > {noformat} > 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedAddFile: exception when add [path] to the file system > org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException: > The directory item limit of [path] is exceeded: limit=1048576 items=1049332 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at >
[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
[ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423006#comment-15423006 ] Hadoop QA commented on HDFS-10729: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 0s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}110m 21s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileCorruption | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12823767/HDFS-10729.001.patch | | JIRA Issue | HDFS-10729 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b09705f97c13 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ffe1fff | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16439/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16439/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16439/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NameNode crashes when loading edits because max directory items is exceeded >
[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
[ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422827#comment-15422827 ] Kihwal Lee commented on HDFS-10729: --- I just hit the submit patch button. > NameNode crashes when loading edits because max directory items is exceeded > --- > > Key: HDFS-10729 > URL: https://issues.apache.org/jira/browse/HDFS-10729 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Critical > Attachments: HDFS-10729.001.patch > > > We encountered a bug where Standby NameNode crashes due to an NPE when > loading edits. > {noformat} > 2016-08-05 15:06:00,983 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, > mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], > permissions=:supergroup:rw-r--r--, aclEntries=null, > clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, > overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, > RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > {noformat} > The NameNode crashes and can not be restarted. After some research, we turned > on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we > saw the following exception which induced NPE: > {noformat} > 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedAddFile: exception when add [path] to the file system > org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException: > The directory item limit of [path] is exceeded: limit=1048576 items=1049332 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:188) > at >
[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
[ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412016#comment-15412016 ] Wei-Chiu Chuang commented on HDFS-10729: Hi Kihwal, thanks for the comment. The SbNN crashed when it started and loaded edits. There was a point in time when {{dfs.namenode.fs-limits.max-directory-items}} was increased, which explains why the number of items exceeds 1048576 by a lot, but maybe it was only changed in ANN but not SbNN. Or maybe it was increased and then decreased. I saw tremendous amount of rename operations before the {{MaxDirectoryItemsExceededException}}, but the version the cluster is running has the HDFS-6099 fix. At this point it looks more like an operational error, although I think the NPE should be improved. > NameNode crashes when loading edits because max directory items is exceeded > --- > > Key: HDFS-10729 > URL: https://issues.apache.org/jira/browse/HDFS-10729 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Critical > > We encountered a bug where Standby NameNode crashes due to an NPE when > loading edits. > {noformat} > 2016-08-05 15:06:00,983 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, > mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], > permissions=:supergroup:rw-r--r--, aclEntries=null, > clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, > overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, > RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > {noformat} > The NameNode crashes and can not be restarted. After some research, we turned > on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we > saw the following exception which induced NPE: > {noformat} > 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedAddFile: exception when add [path] to the file system > org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException: > The directory item limit of [path] is exceeded: limit=1048576 items=1049332 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at >
[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
[ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411901#comment-15411901 ] Kihwal Lee commented on HDFS-10729: --- Do you know how it got into this state? If the serving namenode had the same config, this op should have failed and wouldn't have logged in the first place. Do you suspect a bug in enforcement similar to HDFS-6099? > NameNode crashes when loading edits because max directory items is exceeded > --- > > Key: HDFS-10729 > URL: https://issues.apache.org/jira/browse/HDFS-10729 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Critical > > We encountered a bug where Standby NameNode crashes due to an NPE when > loading edits. > {noformat} > 2016-08-05 15:06:00,983 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, > mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], > permissions=:supergroup:rw-r--r--, aclEntries=null, > clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, > overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, > RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > {noformat} > The NameNode crashes and can not be restarted. After some research, we turned > on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we > saw the following exception which induced NPE: > {noformat} > 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedAddFile: exception when add [path] to the file system > org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException: > The directory item limit of [path] is exceeded: limit=1048576 items=1049332 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at >
[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
[ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411253#comment-15411253 ] Yongjun Zhang commented on HDFS-10729: -- Hi [~jojochuang], Thanks for reporting the issue and finding the cause. It sounds to me a simple supportability fix would to make the message clear about why the NPE (exceeding the max num of file within a directory), and state that the corresponding config param can be increased to make it pass. Thanks. > NameNode crashes when loading edits because max directory items is exceeded > --- > > Key: HDFS-10729 > URL: https://issues.apache.org/jira/browse/HDFS-10729 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Critical > > We encountered a bug where Standby NameNode crashes due to an NPE when > loading edits. > {noformat} > 2016-08-05 15:06:00,983 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, > mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], > permissions=:supergroup:rw-r--r--, aclEntries=null, > clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, > overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, > RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) > {noformat} > The NameNode crashes and can not be restarted. After some research, we turned > on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we > saw the following exception which induced NPE: > {noformat} > 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedAddFile: exception when add [path] to the file system > org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException: > The directory item limit of [path] is exceeded: limit=1048576 items=1049332 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810) > at >