[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2017-04-25 Thread Benjamin Huo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982590#comment-15982590
 ] 

Benjamin Huo commented on HDFS-7535:


I've one question regarding the following comments:
"This snapshot diff report represents the delta that should be applied to the 
backup cluster. For changes like deletion and rename we can directly apply the 
same operations (following some specific order based on their dependency) in 
the backup cluster. For changes like creation, append, and other metadata 
modification we keep using the functionality of the current distcp."

I'm not very clear about what "we keep using the functionality of the current 
distcp" means.

After fix HDFS-7535, the file changes list for creation and modification are 
generated based on snapshots s1 and s2 on the source cluster, or it's generated 
based on the file changes between source cluster and destination cluster?

Thanks
Ben



> Utilize Snapshot diff report for distcp
> ---
>
> Key: HDFS-7535
> URL: https://issues.apache.org/jira/browse/HDFS-7535
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.7.0
>
> Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
> HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch
>
>
> Currently HDFS snapshot diff report can identify file/directory creation, 
> deletion, rename and modification under a snapshottable directory. We can use 
> the diff report for distcp between the primary cluster and a backup cluster 
> to avoid unnecessary data copy. This is especially useful when there is a big 
> directory rename happening in the primary cluster: the current distcp cannot 
> detect the rename op thus this rename usually leads to large amounts of real 
> data copy.
> More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348887#comment-14348887
 ] 

Hudson commented on HDFS-7535:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/])
HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. 
(jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java


 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.7.0

 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348962#comment-14348962
 ] 

Hudson commented on HDFS-7535:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2073 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/])
HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. 
(jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751)
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java


 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.7.0

 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348564#comment-14348564
 ] 

Hudson commented on HDFS-7535:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/123/])
HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. 
(jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java


 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.7.0

 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348580#comment-14348580
 ] 

Hudson commented on HDFS-7535:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #857 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/857/])
HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. 
(jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java


 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.7.0

 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348788#comment-14348788
 ] 

Hudson commented on HDFS-7535:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2055 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2055/])
HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. 
(jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.7.0

 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348820#comment-14348820
 ] 

Hudson commented on HDFS-7535:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/114/])
HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. 
(jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java


 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.7.0

 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347349#comment-14347349
 ] 

Hudson commented on HDFS-7535:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7256 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7256/])
HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. 
(jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751)
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java


 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.7.0

 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-04 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346603#comment-14346603
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7535:
---

+1 the new patch looks good.

Thanks for testing it and adding the new tests.

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346550#comment-14346550
 ] 

Hadoop QA commented on HDFS-7535:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702393/HDFS-7535.004.patch
  against trunk revision 29bb689.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-distcp.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9722//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9722//console

This message is automatically generated.

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-02-26 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338629#comment-14338629
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7535:
---

Thanks for working on this.  The idea is great!  Some comments on the patch:
- In DistCpSync.moveToTmpDir, why move the paths to tmp for the delete 
operations?
- In DistCpSync.moveToTarget, it mkdir if parent directory does not exist.  
Would it be able to preserve other attributes for the -p option?
- In DiffInfo.getDiffs(..), no need to create the entries array.  Just iterate 
over report.getDiffList().
- Please add javadoc for DiffInfo.

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-02-26 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338607#comment-14338607
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7535:
---

 We verify these assumptions before the sync and we fallback to the default 
 distcp behavior ...

Is it better to throw an exception instead since the user may not want to 
fallback?

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339274#comment-14339274
 ] 

Hadoop QA commented on HDFS-7535:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701171/HDFS-7535.003.patch
  against trunk revision 2214dab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-distcp.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9672//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9672//console

This message is automatically generated.

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch, HDFS-7535.003.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-02-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326466#comment-14326466
 ] 

Hadoop QA commented on HDFS-7535:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699520/HDFS-7535.002.patch
  against trunk revision 1714609.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-distcp.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9610//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9610//console

This message is automatically generated.

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, 
 HDFS-7535.002.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2015-02-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325295#comment-14325295
 ] 

Hadoop QA commented on HDFS-7535:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699392/HDFS-7535.001.patch
  against trunk revision 685af8a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1156 javac 
compiler warnings (more than the trunk's current 1155 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-distcp.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9606//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9606//artifact/patchprocess/newPatchFindbugsWarningshadoop-distcp.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9606//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9606//console

This message is automatically generated.

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch


 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp

2014-12-16 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248896#comment-14248896
 ] 

Jing Zhao commented on HDFS-7535:
-

A typical scenario using snapshot for distcp can be like this: every time we 
start distcp between the primary cluster and the backup cluster, a snapshot is 
first created in the primary cluster. Then the snapshot diff report is computed 
between the latest snapshot and the snapshot created for the last distcp. This 
snapshot diff report represents the delta that should be applied to the backup 
cluster. For changes like deletion and rename we can directly apply the same 
operations (following some specific order based on their dependency) in the 
backup cluster. For changes like creation, append, and other metadata 
modification we keep using the functionality of the current distcp. In this 
approach, we can avoid unnecessary data copy and also guarantee the source data 
is immutable since our snapshot is read-only.

We plan to use this jira to provide the basic functionalities in the above 
approach. More specifically, we can first add extra options to the current 
distcp tool so that it can compute the dalta based on the diff report of two 
given snapshot names. How to manage snapshots in the source/target clusters can 
be done in separate jiras or through separate tools.

 Utilize Snapshot diff report for distcp
 ---

 Key: HDFS-7535
 URL: https://issues.apache.org/jira/browse/HDFS-7535
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently HDFS snapshot diff report can identify file/directory creation, 
 deletion, rename and modification under a snapshottable directory. We can use 
 the diff report for distcp between the primary cluster and a backup cluster 
 to avoid unnecessary data copy. This is especially useful when there is a big 
 directory rename happening in the primary cluster: the current distcp cannot 
 detect the rename op thus this rename usually leads to large amounts of real 
 data copy.
 More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)