Siyao Meng created HADOOP-16083: ----------------------------------- Summary: DistCp shouldn't always overwrite the target file when checksums match Key: HADOOP-16083 URL: https://issues.apache.org/jira/browse/HADOOP-16083 Project: Hadoop Common Issue Type: Improvement Components: tools/distcp Affects Versions: 3.1.1, 3.2.0, 3.3.0 Reporter: Siyao Meng Assignee: Siyao Meng
{code:java|title=CopyMapper#setup} ... try { overWrite = overWrite || targetFS.getFileStatus(targetFinalPath).isFile(); } catch (FileNotFoundException ignored) { } ... {code} The above code overrides config key "overWrite" to "true" when the target path is a file. Therefore, unnecessary transfer happens when the source and target file have the same checksums. My suggestion is: remove the code above. If the user insists to overwrite, just add -overwrite in the options: {code:bash|title=DistCp command with -overwrite option} hadoop distcp -overwrite hdfs://localhost:64464/source/5/6.txt hdfs://localhost:64464/target/5/6.txt {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org