Siyao Meng created HADOOP-16083:
-----------------------------------

             Summary: DistCp shouldn't always overwrite the target file when 
checksums match
                 Key: HADOOP-16083
                 URL: https://issues.apache.org/jira/browse/HADOOP-16083
             Project: Hadoop Common
          Issue Type: Improvement
          Components: tools/distcp
    Affects Versions: 3.1.1, 3.2.0, 3.3.0
            Reporter: Siyao Meng
            Assignee: Siyao Meng


{code:java|title=CopyMapper#setup}
...
    try {
      overWrite = overWrite || targetFS.getFileStatus(targetFinalPath).isFile();
    } catch (FileNotFoundException ignored) {
    }
...
{code}

The above code overrides config key "overWrite" to "true" when the target path 
is a file. Therefore, unnecessary transfer happens when the source and target 
file have the same checksums.

My suggestion is: remove the code above. If the user insists to overwrite, just 
add -overwrite in the options:
{code:bash|title=DistCp command with -overwrite option}
hadoop distcp -overwrite hdfs://localhost:64464/source/5/6.txt 
hdfs://localhost:64464/target/5/6.txt
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to