[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2015-02-07 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310628#comment-14310628
 ] 

Abhishek Rai commented on HDFS-6908:


Thanks Harsh, that sounds reasonable.  It gives us a way to avoid having to 
live with the FSImageFormatPBSnapshot hack longer term once the proper fix for 
this bug is applied.

Thanks

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch, 
 HDFS-6908.003.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2015-02-06 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310489#comment-14310489
 ] 

Harsh J commented on HDFS-6908:
---

bq. and since it doesn't fix the corrupt fsimage on disk, the hack will always 
be needed to make the namenode work.

When the NN comes up (ensure you also have this bugfix patched on), you can 
then invoke a {{dfsadmin -saveNamespace}} to recreate a new, good image.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch, 
 HDFS-6908.003.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2015-02-06 Thread Abhishek Rai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310377#comment-14310377
 ] 

Abhishek Rai commented on HDFS-6908:


We have an HDFS installation in production where we ran into this problem.  
Since the fsimage is corrupt, namenode fails to come up, leaving the system 
unusable.  We suspect that the problem was triggered in our case by deletion of 
one of the existing snapshots of a large directory containing several 
sub-directories and files.

While the proposed fix above is definitely useful to prevent this issue going 
forward, is there any recommendation for how to fix an fsimage which was 
already corrupted by this bug?

We temporarily put in the following hack in FSImageFormatPBSnapshot.java to 
mask this problem.  The risk with this hack is that it can mask other 
bugs/corruptions, and since it doesn't fix the corrupt fsimage on disk, the 
hack will always be needed to make the namenode work.

{noformat}
+++ 
b/hadoop/hadoop-2.5.1-src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
@@ -34,6 +34,8 @@
 import java.util.List;
 import java.util.Map;
 
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.fs.permission.PermissionStatus;
 import org.apache.hadoop.hdfs.server.namenode.AclFeature;
@@ -73,6 +75,9 @@
 
 @InterfaceAudience.Private
 public class FSImageFormatPBSnapshot {
+  public static final Log LOG = 
LogFactory.getLog(FSImageFormatPBSnapshot.class);
+
+
   /**
* Loading snapshot related information from protobuf based FSImage
*/
@@ -267,8 +272,12 @@ private void addToDeletedList(INode dnode, INodeDirectory 
parent) {
   // load non-reference inodes
   for (long deletedId : deletedNodes) {
 INode deleted = fsDir.getInode(deletedId);
-dlist.add(deleted);
-addToDeletedList(deleted, dir);
+if (deleted != null) {
+  dlist.add(deleted);
+  addToDeletedList(deleted, dir);
+} else {
+  LOG.error(Could not find inode  + deletedId +  from deleted-list 
of directory:  + dir.toDetailString());
+}
{noformat}

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch, 
 HDFS-6908.003.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 

[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-26 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110927#comment-14110927
 ] 

Juan Yu commented on HDFS-6908:
---

[~jingzhao]] Thanks for reviewing patch and the discussion.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch, 
 HDFS-6908.003.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-25 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109386#comment-14109386
 ] 

Jing Zhao commented on HDFS-6908:
-

+1 for the latest patch. Thanks for the fix, [~j...@cloudera.com]! I will 
commit it later today.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch, 
 HDFS-6908.003.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108592#comment-14108592
 ] 

Hadoop QA commented on HDFS-6908:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663979/HDFS-6908.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestFileAppend
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7744//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7744//console

This message is automatically generated.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch, 
 HDFS-6908.003.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-24 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108735#comment-14108735
 ] 

Juan Yu commented on HDFS-6908:
---

I don't think the failed tests are related to this patch.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch, 
 HDFS-6908.003.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107908#comment-14107908
 ] 

Hadoop QA commented on HDFS-6908:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663603/HDFS-6908.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZones
  org.apache.hadoop.hdfs.TestDistributedFileSystem
  org.apache.hadoop.hdfs.TestFileAppend4

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7735//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7735//console

This message is automatically generated.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-23 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108096#comment-14108096
 ] 

Jing Zhao commented on HDFS-6908:
-

Actually when deleting a directory from the current namespace, it may be better 
to call {{destroyCreated}} before calling {{cleanSubtreeRecursively}}. This is 
because:
# The current bug only exists in snapshot deletion scenario (when deleting a 
directory, there is no snapshot combination logic involved).
# {{cleanSubtreeRecursively}} goes through the complete children list of the 
current directory, where the children that are contained in the created list 
anyway should be completely removed. We can avoid processing these 
files/directories in {{cleanSubtreeRecursively}} if we call {{destroyCreated}} 
first.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106516#comment-14106516
 ] 

Jing Zhao commented on HDFS-6908:
-

Yeah, I think that is necessary when deleting a snapshot. But when deleting a 
dir/file from the current fsdir, I guess it should be ok to place 
{{cleanSubtreeRecursively}} in the end.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107809#comment-14107809
 ] 

Hadoop QA commented on HDFS-6908:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663603/HDFS-6908.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.security.TestRefreshUserMappings
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7732//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7732//console

This message is automatically generated.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch, HDFS-6908.002.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106127#comment-14106127
 ] 

Jing Zhao commented on HDFS-6908:
-

Thanks for working on this, [~j...@cloudera.com]! Actually this is a case the 
current code fails to cover. Your analysis makes sense to me.

However, for the fix, if we only call dir.removeChild, the inodes that were 
created between prior snapshot and the deleting one will still be kept in the 
created list, thus can cause leaking. Maybe a better way to fix is to call 
{{cleanSubtreeRecursively}} before {{cleanDeletedINode}}:
{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
 b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hado
index 9893bba..a4f69f0 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
@@ -722,6 +722,8 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
 counts.add(lastDiff.diff.destroyCreatedList(currentINode,
 collectedBlocks, removedINodes));
   }
+  counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
+  collectedBlocks, removedINodes, priorDeleted, countDiffChange));
 } else {
   // update prior
   prior = getDiffs().updatePrior(snapshot, prior);
@@ -739,7 +741,10 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
   
   counts.add(getDiffs().deleteSnapshotDiff(snapshot, prior,
   currentINode, collectedBlocks, removedINodes, countDiffChange));
-  
+
+  counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
+  collectedBlocks, removedINodes, priorDeleted, countDiffChange));
+
   // check priorDiff again since it may be created during the diff deletion
   if (prior != Snapshot.NO_SNAPSHOT_ID) {
 DirectoryDiff priorDiff = this.getDiffs().getDiffById(prior);
@@ -778,9 +783,7 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
 }
   }
 }
-counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
-collectedBlocks, removedINodes, priorDeleted, countDiffChange));
-
+
 if (currentINode.isQuotaSet()) {
   currentINode.getDirectoryWithQuotaFeature().addSpaceConsumed2Cache(
   -counts.get(Quota.NAMESPACE), -counts.get(Quota.DISKSPACE));
{code}

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Juan Yu
Assignee: Juan Yu
 Attachments: HDFS-6908.001.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 

[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106133#comment-14106133
 ] 

Jing Zhao commented on HDFS-6908:
-

For the current patch, another comment is that we can move the new unit test to 
TestSnapshotDeletion.java, and call {{hdfs.delete(file1, true);}} instead of 
{{hdfs.delete(file1);}}.

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Juan Yu
Assignee: Juan Yu
 Attachments: HDFS-6908.001.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-21 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106231#comment-14106231
 ] 

Juan Yu commented on HDFS-6908:
---

Thanks [~jingzhao].
because the directory is deleted, it means the file created between prior 
snapshot and the deleting one must be deleted as well. so there are 
create/delete pair operations for those files. the file diff processing part 
will add the file to removedINodes list. when I debug the fix, I saw the inode 
for the file are deleted correctly, no leak. and the intermediate create/delete 
file change is cleaned after combining the diff with prior one as well.

{code}
} else if (topNode.isFile()  topNode.asFile().isWithSnapshot()) {
INodeFile file = topNode.asFile();
counts.add(file.getDiffs().deleteSnapshotDiff(post, prior, file,
collectedBlocks, removedINodes, countDiffChange));
{code}

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106272#comment-14106272
 ] 

Jing Zhao commented on HDFS-6908:
-

Thanks for the response, [~j...@cloudera.com].

bq. so there are create/delete pair operations for those files.

The challenge here is that we cannot guarantee we always have the create/delete 
pair here. Imagine the deletion happens on the directory while the creation 
happens on a file under the directory. Then we cannot depend on the snapshot 
diff combination to clean the file. The following unit test (based on your 
original test case) demos the scenario (but with your patch the following test 
will fail before the leaking check):
{code}
  @Test (timeout=6)
  public void testDeleteSnapshot() throws Exception {
final Path root = new Path(/);

Path dir = new Path(/dir1);
Path file1 = new Path(dir, file1);
DFSTestUtil.createFile(hdfs, file1, BLOCKSIZE, REPLICATION, seed);

hdfs.allowSnapshot(root);
hdfs.createSnapshot(root, s1);

Path file2 = new Path(dir, file2);
DFSTestUtil.createFile(hdfs, file2, BLOCKSIZE, REPLICATION, seed);
INodeFile file2Node = fsdir.getINode(file2.toString()).asFile();
long file2NodeId = file2Node.getId();

hdfs.createSnapshot(root, s2);

// delete directory
assertTrue(hdfs.delete(dir, true));
assertNotNull(fsdir.getInode(file2NodeId));

// delete second snapshot
hdfs.deleteSnapshot(root, s2);
assertTrue(fsdir.getInode(file2NodeId) == null);

NameNodeAdapter.enterSafeMode(cluster.getNameNode(), false);
NameNodeAdapter.saveNamespace(cluster.getNameNode());

// restart NN
cluster.restartNameNodes();
  }
{code}


 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion

2014-08-21 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106507#comment-14106507
 ] 

Juan Yu commented on HDFS-6908:
---

[~jingzhao]] Thanks for the new unit test and explain the difference.
I assumed when deleting a directory recursively, all children will be added to 
the diff list. but that's not how the implementation is done. snapshot diff 
only record directory deletion. so the fix you suggested is better.
One more question, I think what's really needed is to  call 
{{cleanSubtreeRecursively}} before {{destroyCreatedList}}, isn't it?
{code}
+  counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
+  collectedBlocks, removedINodes, priorDeleted, countDiffChange));
  // delete everything in created list
  DirectoryDiff lastDiff = diffs.getLast();
  if (lastDiff != null) {
 counts.add(lastDiff.diff.destroyCreatedList(currentINode,
 collectedBlocks, removedINodes));
   }
 } else {
   // update prior
   prior = getDiffs().updatePrior(snapshot, prior);
@@ -739,7 +741,10 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
   
   counts.add(getDiffs().deleteSnapshotDiff(snapshot, prior,
   currentINode, collectedBlocks, removedINodes, countDiffChange));
-  
+
+  counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
+  collectedBlocks, removedINodes, priorDeleted, countDiffChange));
+
   // check priorDiff again since it may be created during the diff deletion
   if (prior != Snapshot.NO_SNAPSHOT_ID) {
 DirectoryDiff priorDiff = this.getDiffs().getDiffById(prior);
@@ -778,9 +783,7 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
 }
   }
 }
-counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
-collectedBlocks, removedINodes, priorDeleted, countDiffChange));
-
+
 if (currentINode.isQuotaSet()) {
   currentINode.getDirectoryWithQuotaFeature().addSpaceConsumed2Cache(
   -counts.get(Quota.NAMESPACE), -counts.get(Quota.DISKSPACE));
{code}

 incorrect snapshot directory diff generated by snapshot deletion
 

 Key: HDFS-6908
 URL: https://issues.apache.org/jira/browse/HDFS-6908
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Juan Yu
Assignee: Juan Yu
Priority: Critical
 Attachments: HDFS-6908.001.patch


 In the following scenario, delete snapshot could generate incorrect snapshot 
 directory diff and corrupted fsimage, if you restart NN after that, you will 
 get NullPointerException.
 1. create a directory and create a file under it
 2. take a snapshot
 3. create another file under that directory
 4. take second snapshot
 5. delete both files and the directory
 6. delete second snapshot
 incorrect directory diff will be generated.
 Restart NN will throw NPE
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)