[jira] [Commented] (HDFS-13813) Exit NameNode when dangling child inode is detected when saving FsImage

2018-08-09 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575519#comment-16575519
 ] 

genericqa commented on HDFS-13813:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 58s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.TestDFSClientRetries |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13813 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935037/HDFS-13813.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4ca21b0401e4 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8244abb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24739/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24739/testReport/ |
| Max. process+thread count | 3396 (vs. ulimit of 1) |
| 

[jira] [Commented] (HDFS-13813) Exit NameNode when dangling child inode is detected when saving FsImage

2018-08-09 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575467#comment-16575467
 ] 

Siyao Meng commented on HDFS-13813:
---

Thanks [~yzhangal] for the comment.

Sure, I will definitely add an option in the future to reduce overhead when 
things are working fine.

> Exit NameNode when dangling child inode is detected when saving FsImage
> ---
>
> Key: HDFS-13813
> URL: https://issues.apache.org/jira/browse/HDFS-13813
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.3
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13813.001.patch
>
>
> Recently, the same stack trace as in -HDFS-9406- appears again in the field. 
> The symptom of the problem is that *loadINodeDirectorySection()* can't find a 
> child inode in inodeMap by the node id in the children list of the directory. 
> The child inode could be missing or deleted.
> As for now we didn't have a clear trace to reproduce the problem. Therefore, 
> I'm proposing this improvement to detect such corruption (data structure 
> inconsistency) when saving the FsImage, so that we can have the FsImage and 
> Edit Log to hopefully reproduce the problem stably.
>  
> In a previous patch HDFS-13314, [~arpitagarwal] did a great job catching 
> potential FsImage corruption in two cases. This patch includes a third case 
> where a child inode does not exist in the global FSDirectory dir when saving 
> (serializing) INodeDirectorySection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13813) Exit NameNode when dangling child inode is detected when saving FsImage

2018-08-09 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575408#comment-16575408
 ] 

Yongjun Zhang commented on HDFS-13813:
--

Thanks [~smeng], good work here!

I did not review the patch, but have a general comment, the checking done in 
HDFS-13314, plus the checking done here, are for debugging. I hope we can have 
a follow-up Jira to use a configuration parameter to control the 
enabling/disabling of the checking, as an optimization. 

 

 

> Exit NameNode when dangling child inode is detected when saving FsImage
> ---
>
> Key: HDFS-13813
> URL: https://issues.apache.org/jira/browse/HDFS-13813
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.3
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13813.001.patch
>
>
> Recently, the same stack trace as in -HDFS-9406- appears again in the field. 
> The symptom of the problem is that *loadINodeDirectorySection()* can't find a 
> child inode in inodeMap by the node id in the children list of the directory. 
> The child inode could be missing or deleted.
> As for now we didn't have a clear trace to reproduce the problem. Therefore, 
> I'm proposing this improvement to detect such corruption (data structure 
> inconsistency) when saving the FsImage, so that we can have the FsImage and 
> Edit Log to potentially stably reproduce the problem.
>  
> In a previous patch HDFS-13314, [~arpitagarwal] did a great job catching 
> potential FsImage corruption in two cases. Further, this patch would detect 
> if a child inode exist in the global FSDirectory dir when saving 
> (serializing) INodeDirectorySection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13813) Exit NameNode when dangling child inode is detected when saving FsImage

2018-08-09 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575405#comment-16575405
 ] 

Wei-Chiu Chuang commented on HDFS-13813:


+1 pending jenkins

> Exit NameNode when dangling child inode is detected when saving FsImage
> ---
>
> Key: HDFS-13813
> URL: https://issues.apache.org/jira/browse/HDFS-13813
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.3
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13813.001.patch
>
>
> Recently, the same stack trace as in -HDFS-9406- appears again in the field. 
> The symptom of the problem is that *loadINodeDirectorySection()* can't find a 
> child inode in inodeMap by the node id in the children list of the directory. 
> The child inode could be missing or deleted.
> As for now we didn't have a clear trace to reproduce the problem. Therefore, 
> I'm proposing this improvement to detect such corruption (data structure 
> inconsistency) when saving the FsImage, so that we can have the FsImage and 
> Edit Log to potentially stably reproduce the problem.
>  
> In a previous patch HDFS-13314, [~arpitagarwal] did a great job catching 
> potential FsImage corruption in two cases. Further, this patch would detect 
> if a child inode exist in the global FSDirectory dir when saving 
> (serializing) INodeDirectorySection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org