[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-31 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452990#comment-15452990
 ] 

Kihwal Lee commented on HDFS-10729:
---

They pass for me too.
{noformat}
---
 T E S T S
---
Running org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.175 sec - in 
org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
Running org.apache.hadoop.hdfs.TestLeaseRecovery2
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.721 sec - in 
org.apache.hadoop.hdfs.TestLeaseRecovery2
Running org.apache.hadoop.hdfs.TestFileCorruption
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.311 sec - in 
org.apache.hadoop.hdfs.TestFileCorruption
Running org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.407 sec - in 
org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser

Results :

Tests run: 19, Failures: 0, Errors: 0, Skipped: 0
{noformat}

> NameNode crashes when loading edits because max directory items is exceeded
> ---
>
> Key: HDFS-10729
> URL: https://issues.apache.org/jira/browse/HDFS-10729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HDFS-10729.001.patch
>
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
> at 
> 

[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-30 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449921#comment-15449921
 ] 

Wei-Chiu Chuang commented on HDFS-10729:


All failed tests passed locally. [~kihwal] would you like to take a look again? 
Thanks

> NameNode crashes when loading edits because max directory items is exceeded
> ---
>
> Key: HDFS-10729
> URL: https://issues.apache.org/jira/browse/HDFS-10729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HDFS-10729.001.patch
>
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> 

[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423006#comment-15423006
 ] 

Hadoop QA commented on HDFS-10729:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}110m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.TestFileCorruption |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12823767/HDFS-10729.001.patch |
| JIRA Issue | HDFS-10729 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b09705f97c13 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ffe1fff |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16439/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16439/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16439/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NameNode crashes when loading edits because max directory items is exceeded
> 

[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-16 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422827#comment-15422827
 ] 

Kihwal Lee commented on HDFS-10729:
---

I just hit the submit patch button.

> NameNode crashes when loading edits because max directory items is exceeded
> ---
>
> Key: HDFS-10729
> URL: https://issues.apache.org/jira/browse/HDFS-10729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HDFS-10729.001.patch
>
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:188)
> at 
> 

[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-08 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412016#comment-15412016
 ] 

Wei-Chiu Chuang commented on HDFS-10729:


Hi Kihwal, thanks for the comment.
The SbNN crashed when it started and loaded edits. There was a point in time 
when {{dfs.namenode.fs-limits.max-directory-items}} was increased, which 
explains why the number of items exceeds 1048576 by a lot, but maybe it was 
only changed in ANN but not SbNN. Or maybe it was increased and then decreased.

I saw tremendous amount of rename operations before the 
{{MaxDirectoryItemsExceededException}}, but the version the cluster is running 
has the HDFS-6099 fix.

At this point it looks more like an operational error, although I think the NPE 
should be improved.

> NameNode crashes when loading edits because max directory items is exceeded
> ---
>
> Key: HDFS-10729
> URL: https://issues.apache.org/jira/browse/HDFS-10729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> 

[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-08 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411901#comment-15411901
 ] 

Kihwal Lee commented on HDFS-10729:
---

Do you know how it got into this state?  If the serving namenode had the same 
config, this op should have failed and wouldn't have logged in the first place. 
Do you suspect a bug in enforcement similar to HDFS-6099?

> NameNode crashes when loading edits because max directory items is exceeded
> ---
>
> Key: HDFS-10729
> URL: https://issues.apache.org/jira/browse/HDFS-10729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> 

[jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded

2016-08-07 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411253#comment-15411253
 ] 

Yongjun Zhang commented on HDFS-10729:
--

Hi [~jojochuang],

Thanks for reporting the issue and finding the cause. It sounds to me a simple 
supportability fix would to make the message clear about why the NPE (exceeding 
the max num of file within a directory),  and state that the corresponding 
config param can be increased to make it pass.

Thanks.




> NameNode crashes when loading edits because max directory items is exceeded
> ---
>
> Key: HDFS-10729
> URL: https://issues.apache.org/jira/browse/HDFS-10729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
> at 
>