[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-12-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999379#comment-16999379
 ] 

Hudson commented on HDFS-13101:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17773 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17773/])
HDFS-15012. NN fails to parse Edit logs after applying HDFS-13101. (shashikant: 
rev fdd96e46d1f89f0ecdb9b1836dc7fca9fbb954fd)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestRenameWithSnapshots.java


> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, HDFS-13101.branch-2.001.patch, 
> HDFS-13101.branch-2.8.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909673#comment-16909673
 ] 

Hadoop QA commented on HDFS-13101:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
51s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:da67579 |
| JIRA Issue | HDFS-13101 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977857/HDFS-13101.branch-2.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 770035ce387f 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git 

[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-17 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909639#comment-16909639
 ] 

Wei-Chiu Chuang commented on HDFS-13101:


Submitted a branch-2 patch. There is just a small conflict in 
DirectoryWithSnapshotFeature.java where 
{code}
for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) {
{code}
was changed to
{code}
for (INode cNode : priorDiff.getChildrenDiff().getList(
  ListType.CREATED)) {
{code}

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, HDFS-13101.branch-2.001.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-17 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909641#comment-16909641
 ] 

Wei-Chiu Chuang commented on HDFS-13101:


[~smeng] please file a jira to update the test. Thanks.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, HDFS-13101.branch-2.001.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-16 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909470#comment-16909470
 ] 

Siyao Meng commented on HDFS-13101:
---

[~shashikant] I found that the updated unit test `testDoubleRename` (was 
`testFSImageCorruption`) since patch v3 doesn't reproduce the corruption 
anymore. I tested that even if [L742-747 and 
L754|https://github.com/apache/hadoop/blob/0a85af959ce505f0659e5c69d0ca83a5dce0a7c2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java#L742-L747]
 is commented out, the new unit test won't catch the corruption. But the old 
unit test in v2 WILL. I might file another jira to improve the unit test.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-16 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908897#comment-16908897
 ] 

Shashikant Banerjee commented on HDFS-13101:


Thanks [~szetszwo][~jojochuang][~smeng][~jojochuang][~adam.antal] for all the 
contribution. I have committed the change to trunk.

[~jojochuang], thanks for all the help in backporting it.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908523#comment-16908523
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


> ... Do you plan to cherrypick the commit into lower branches?  I am happy to 
> help out ...

[~jojochuang], sound good.  Please help.  Thanks a lot!

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-15 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908517#comment-16908517
 ] 

Wei-Chiu Chuang commented on HDFS-13101:


[~szetszwo] the commit landed in trunk. Do you plan to cherrypick the commit 
into lower branches? I am happy to help out since this is a critical issue I 
think we should backport as much as possible.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907834#comment-16907834
 ] 

Hudson commented on HDFS-13101:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17127 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17127/])
HDFS-13101. Yet another fsimage corruption related to snapshot. (shashikant: 
rev 0a85af959ce505f0659e5c69d0ca83a5dce0a7c2)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiffList.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotTestHelper.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileWithSnapshotFeature.java


> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904014#comment-16904014
 ] 

Hadoop QA commented on HDFS-13101:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  5m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}142m 25s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}213m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
|   | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestErasureCodingMultipleRacks |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.TestSetTimes |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-13101 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977155/HDFS-13101.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b1c30018177c 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 43a91f8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27453/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test 

[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-09 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903853#comment-16903853
 ] 

Shashikant Banerjee commented on HDFS-13101:


Thanks [~szetszwo] for the review. Patch v4 addresses  the checkstyle issues. 
The unit tests are not related as all tests but one reported here pass locally 
and the one fails with/without the fix. 

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902330#comment-16902330
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


[~shashikant], great work on the patch!  Could you fix the checkstyle warnings 
and see if the unit test failures are related?

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900398#comment-16900398
 ] 

Hadoop QA commented on HDFS-13101:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 48s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 10 new + 154 unchanged - 0 fixed = 164 total (was 154) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 55s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-13101 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976729/HDFS-13101.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8b45490bccf3 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c589983 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27407/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27407/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test 

[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-05 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900382#comment-16900382
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


+1 the 003 patch looks good.  Pending Jenkins.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-05 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900253#comment-16900253
 ] 

Shashikant Banerjee commented on HDFS-13101:


Thanks [~szetszwo] [~jojochuang] [~smeng] for the suggestion. I have uploaded a 
patch v3 which modifies the test a bit and verifies quota as well as well as 
the complete snapshotTree. Please have a look.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-02 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899277#comment-16899277
 ] 

Wei-Chiu Chuang commented on HDFS-13101:


With my limited knowledge in the snapshot implementation, Nicholas's fix looks 
correct to me. THANK YOU!

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-02 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899234#comment-16899234
 ] 

Siyao Meng commented on HDFS-13101:
---

Awesome work [~szetszwo] [~shashikant] [~jojochuang]! So glad to see a 
long-lasting problem making great progress.

I've put [~szetszwo]'s patch and [~jojochuang]'s simplified test into a single 
patch based on trunk:

 [^HDFS-13101.002.patch] 

The unit test passed locally.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-02 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899171#comment-16899171
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


The fix could be
{code}
// DirectoryWithSnapshotFeature#cleanDirectory().
   if (priorCreated != null) {
-// we only check the node originally in prior's created list
-for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) {
-  if (priorCreated.containsKey(cNode)) {
-cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID);
+if (currentINode.isLastReference()) {
+  // if this is the last reference, the created list can be 
destroyed.
+  priorDiff.getChildrenDiff().destroyCreatedList(
+  reclaimContext, currentINode);
+} else {
+  // we only check the node originally in prior's created list
+  for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) {
+if (priorCreated.containsKey(cNode)) {
+  cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID);
+}
   }
 }
   }
{code}
where isLastReference() is a new method in INode.
{code}
//INode.java
  /**
   * @return true if this is a reference and the reference count is 1;
   * otherwise, return false.
   */
  public boolean isLastReference() {
final INodeReference ref = getParentReference();
if (!(ref instanceof WithCount)) {
  return false;
}
return ((WithCount)ref).getReferenceCount() == 1;
  }
{code}

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-02 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899168#comment-16899168
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


I agree that the bug is in DirectoryWithSnapshotFeature#cleanDirectory().  When 
the directory is the last reference, the entire created list should be 
destroyed instead of cleaning individual cNode.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-01 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897982#comment-16897982
 ] 

Shashikant Banerjee commented on HDFS-13101:


Thanks [~jojochuang] for simplifying the test case. The bug most probably lies 
here in this code path
{code:java}
DirectoryWithSnapshotFeature#cleanDirectory():

// check priorDiff again since it may be created during the diff deletion
if (prior != NO_SNAPSHOT_ID) {
  DirectoryDiff priorDiff = this.getDiffs().getDiffById(prior);
  if (priorDiff != null && priorDiff.getSnapshotId() == prior) {
// For files/directories created between "prior" and "snapshot", 
// we need to clear snapshot copies for "snapshot". Note that we must
// use null as prior in the cleanSubtree call. Files/directories that
// were created before "prior" will be covered by the later 
// cleanSubtreeRecursively call.
if (priorCreated != null) {
  // we only check the node originally in prior's created list
  for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) {
if (priorCreated.containsKey(cNode)) {
  cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID);
}
  }
}
{code}
Any entry created under the directory after the prior snapshot(S0 in the prev 
example), the tree will be traversed and only the inodes which have a diff 
record associated with snaphot to be deleted will be cleaned in case there is 
no other reference left (these don't exist in the active fs anymore) as a part 
of deleting the snapshot diff. But, the corresponding entries won't be removed 
from the child list of the parent. This leads to dangling references to the 
child inode in the directory . All the descendents of dir ("dirb" in the above 
exmaple) which don't have any diff associated with the snapshot to be 
deleted(s1 in above example) will be left behind even though they have been 
deleted from the active fs even after the snapshot deletion.

 

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-07-31 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897716#comment-16897716
 ] 

Wei-Chiu Chuang commented on HDFS-13101:


Thanks [~shashikant] that is tremendous!! A unit test like this is critical to 
the final fix. I was able to further reduced the unit test down.

Namely,
 1. Create the following directories
{noformat}
/dir1
 /dira
  /dirb
 /dirx
/dir2
{noformat}
2. create a snapshot s0 at /dir1
 3. Add one file
{noformat}
/dir1
 /dira
  /dirb
/file1
 /dirx
/dir2
{noformat}
4. Move /dir1/dira/dirb to /dir1/dirx/dirb
{noformat}
/dir1
 /dira
 /dirx
  /dirb
/file1
/dir2
{noformat}
5. Create a snapshot s1 at /dir1
 6. Append to file /dir1/dirx/dirb/file1
 7. Create /dir2/dira
{noformat}
/dir1
 /dira
 /dirx
  /dirb
  /file1

/dir2
  /dira
{noformat}
8. Move /dir1/dirx/dirb to /dir2/dira/dirb
{noformat}
/dir1
 /dira
 /dirx

/dir2
 /dira
  /dirb
   /file1
{noformat}
9. Delete /dir2/dira/dirb
{noformat}
/dir1
 /dira
 /dirx

/dir2
 /dira
{noformat}
10. Delete snapshot s1
 11. Safe fsimage and restart

 

At the point of detection, there is a INodeReference dirb, pointing to 
INodeDirectory dirb.

dirb is in the snapshot s0, which is not deleted.

file1 is not in snapshot s0, so its node got deleted. But dirb’s child list has 
file1.

 

So, the problem is, INodeDirectory fails to remove the child inode from its 
child list. dirb is in snapshot s0, but file1 is not, so file1 should be 
removed.

The bug is probably somewhere within 
\{{DirectoryWithSnapshotFeature#cleanDirectory()}}

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-07-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896814#comment-16896814
 ] 

Shashikant Banerjee commented on HDFS-13101:


Attached a patch containing a test case which reproduces the possible fsImage 
corruption while deleting a snapshot and detected when "saveNamespace" command 
gets executed after that:
 [^HDFS-13101.corruption_repro.patch]

The test fails with the following signature when saveNameSpaceCommand gets 
executed:
{code:java}
2019-07-31 12:04:07,204 [IPC Server handler 0 on default port 55561] INFO 
namenode.FSImage (FSImage.java:saveNamespace(1143)) - Save namespace ...
2019-07-31 12:04:07,215 [FSImageSaver for 
/Users/sbanerjee/hadoop_commit/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name-0-2
 of type IMAGE_AND_EDITS] ERROR namenode.FSImage 
(FSImageFormatPBINode.java:serializeINodeDirectorySection(544)) - 
FSImageFormatPBINode#serializeINodeDirectorySection: Dangling child pointer 
found. Missing INode in inodeMap: id=16394; path=file1; parent=null
2019-07-31 12:04:07,215 [FSImageSaver for 
/Users/sbanerjee/hadoop_commit/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name-0-1
 of type IMAGE_AND_EDITS] ERROR namenode.FSImage 
(FSImageFormatPBINode.java:serializeINodeDirectorySection(544)) - 
FSImageFormatPBINode#serializeINodeDirectorySection: Dangling child pointer 
found. Missing INode in inodeMap: id=16394; path=file1; parent=null
2019-07-31 12:04:07,232 [FSImageSaver for 
/Users/sbanerjee/hadoop_commit/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name-0-1
 of type IMAGE_AND_EDITS] ERROR namenode.FSImage 
(FSImage.java:saveFSImage(993)) - Detected 1 errors while saving FsImage 
/Users/sbanerjee/hadoop_commit/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name-0-1/current/fsimage_027
2019-07-31 12:04:07,232 [FSImageSaver for 
/Users/sbanerjee/hadoop_commit/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name-0-2
 of type IMAGE_AND_EDITS] ERROR namenode.FSImage 
(FSImage.java:saveFSImage(993)) - Detected 1 errors while saving FsImage 
/Users/sbanerjee/hadoop_commit/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name-0-2/current/fsimage_027
2019-07-31 12:04:07,244 [IPC Server handler 0 on default port 55561] ERROR 
namenode.FSImage (FSImage.java:saveNamespace(1180)) - NameNode process will 
exit now... The saved FsImage IMAGE is potentially corrupted.
{code}
Will add more details as per further analysis.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-04-02 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807752#comment-16807752
 ] 

Siyao Meng commented on HDFS-13101:
---

Appreciate for the work [~adam.antal]. My bad I haven't clarified this earlier.
I believe we could find an actual repro of the problem with a unit test, 
eventually.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13101.001.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-04-02 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807736#comment-16807736
 ] 

Adam Antal commented on HDFS-13101:
---

After some investigation it turned out that this patch is just checking the 
"hdfs dfs -ls" command's proper behaviour when the inode does not exist 
anymore. Thus the test do reproduce a corruption but tells us nothing how the 
corruption was formed in our original case. Sorry for the false hopes.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13101.001.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-01-09 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738485#comment-16738485
 ] 

Yongjun Zhang commented on HDFS-13101:
--

Hi All, Thanks for continuing to drive the issue. Sorry I missed 
[~jojochuang]'s request to reassign to [~smeng] earlier. Just did it. Since 
[~adam.antal] is contributing, you guys can co-contribute to this jira. Thanks.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13101.001.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-01-04 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734356#comment-16734356
 ] 

Adam Antal commented on HDFS-13101:
---

Hi everyone,

I've uploaded a patch ([^HDFS-13101.001.patch]) which we believe has 
contributed to the corruption. It contains a sequence of actions in a testcase, 
and at the end you can see a kind of corruption in the data structure of hdfs. 
It does not contain any customer-related information, so you can use and 
investigate it freely.

It you found anything (maybe that it does not concern the corruption at all), 
feel free to share it with us.

Good luck! I hope after all this time some light might shed upon this bug.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13101.001.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-12-13 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720537#comment-16720537
 ] 

Siyao Meng commented on HDFS-13101:
---

Hi [~arpitagarwal], sorry for the long delay. We've asked the customer if they 
are willing to share the image but still hasn't got any response so far.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-11-19 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692381#comment-16692381
 ] 

Arpit Agarwal commented on HDFS-13101:
--

Thanks for the update [~smeng]. Do you have any further details to share beyond 
delete trash + delete snapshots?

We can also try to repro and debug it.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-11-19 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692378#comment-16692378
 ] 

Siyao Meng commented on HDFS-13101:
---

[~arpitagarwal] I'm working on the root cause but so far no conclusions yet. 
Would definitely post a patch if I fixed it.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-11-19 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692291#comment-16692291
 ] 

Arpit Agarwal commented on HDFS-13101:
--

Looks like [~yzhangal] is out?

[~jojochuang], if you or [~smeng] have a patch can you just post it meanwhile? 
We are eager to see this fixed soon, as you can probably tell. :)

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-11-12 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684164#comment-16684164
 ] 

Arpit Agarwal commented on HDFS-13101:
--

That's great news [~jojochuang]. Would be good to finally put this to rest.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-11-10 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682387#comment-16682387
 ] 

Wei-Chiu Chuang commented on HDFS-13101:


A little bit of updates:
After HDFS-13314 was applied, we were finally able to identify a set of 
fsimages + edits that reproduced corruption reliably. Spoiler: the fsimage was 
already partially corrupted before HDFS-13314 aborted check point. So it took 
us a little while to understand other sequences of events leading to the null 
pointer. The gist of the issue is a delete trash operation followed by delete 
snapshots.

The fix for this bug is still undergoing. [~sodonnell], [~smeng] and 
[~adam.antal] have put in tremendous number of hours on it, and they've been 
actively working on the reproduction in the past few weeks.

[~yzhangal] mind if I reassign this jira to [~smeng] for tracking purposes?

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2018-03-19 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405376#comment-16405376
 ] 

Arpit Agarwal commented on HDFS-13101:
--

[~yzhangal], please take a look at the proposed patch in HDFS-13314, to 
potentially detect this corruption sooner (we still don't have a root cause).

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org