Shashikant Banerjee created HDFS-16145:
------------------------------------------

             Summary: CopyListing fails with FNF exception with snapshot diff
                 Key: HDFS-16145
                 URL: https://issues.apache.org/jira/browse/HDFS-16145
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: distcp
            Reporter: Shashikant Banerjee
            Assignee: Shashikant Banerjee


Distcp with snapshotdiff and with filters, marks a Rename as a delete opeartion 
on the target if the rename target is to a directory which is exluded by the 
filter. But, in cases, where files/subdirs created/modified prior to the Rename 
post the old snapshot will still be present as modified/created entries in the 
final copy list. Since, the parent diretory is marked for deletion, these 
subsequent create/modify entries should be ignored while building the final 
copy list. 

With such cases, when the final copy list is built, distcp tries to do a lookup 
for each create/modified file in the l\newer snapshot which will fail as, the 
parent dir is already moved to a new location in later snapshot.

 
{code:java}
sudo -u kms hadoop key create testkey
hadoop fs -mkdir -p /data/gcgdlknnasg/
hdfs crypto -createZone -keyName testkey -path /data/gcgdlknnasg/
hadoop fs -mkdir -p /dest/gcgdlknnasg
hdfs crypto -createZone -keyName testkey -path /dest/gcgdlknnasg
hdfs dfs -mkdir /data/gcgdlknnasg/dir1
hdfs dfsadmin -allowSnapshot /data/gcgdlknnasg/ 
hdfs dfsadmin -allowSnapshot /dest/gcgdlknnasg/ 

[root@nightly62x-1 logs]# hdfs dfs -ls -R /data/gcgdlknnasg/
drwxrwxrwt   - hdfs supergroup          0 2021-07-16 14:05 
/data/gcgdlknnasg/.Trash
drwxr-xr-x   - hdfs supergroup          0 2021-07-16 13:07 
/data/gcgdlknnasg/dir1
[root@nightly62x-1 logs]# hdfs dfs -ls -R /dest/gcgdlknnasg/
[root@nightly62x-1 logs]#

hdfs dfs -put /etc/hosts /data/gcgdlknnasg/dir1/
hdfs dfs -rm -r /data/gcgdlknnasg/dir1/
hdfs dfs -mkdir /data/gcgdlknnasg/dir1/

===> Run BDR with “Abort on Snapshot Diff Failures” CHECKED now in the 
replication schedule. You get into below error and failure of the BDR job.

21/07/16 15:02:30 INFO distcp.DistCp: Failed to use snapshot diff - 
java.io.FileNotFoundException: File does not exist: 
/data/gcgdlknnasg/.snapshot/distcp-5-46485360-new/dir1/hosts
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1494)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1487)
……..
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to