[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881254#comment-16881254 ] Andrew Olson commented on HADOOP-15281: --- Attached a corrected branch-2 patch file. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch, > HADOOP-15281-branch-2-001.patch, HADOOP-15281-branch-2-002.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write = true) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820197#comment-16820197 ] Arun Suresh commented on HADOOP-15281: -- Thanks for posting the patch for branch-2 [~ste...@apache.org]. Found a minor nit while testing though: {{OptionsParser:165}} we need *{{option.setDirectWrite(false);}}* to be *{{option.setDirectWrite(true);}}* > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch, > HADOOP-15281-branch-2-001.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write = true) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770147#comment-16770147 ] Hadoop QA commented on HADOOP-15281: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 3s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 25s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-tools: The patch generated 10 new + 131 unchanged - 2 fixed = 141 total (was 133) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 9s{color} | {color:red} hadoop-distcp in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 41s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 43m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestOptionsParser | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:da67579 | | JIRA Issue | HADOOP-15281 | | JIRA Patch URL |
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766363#comment-16766363 ] Steve Loughran commented on HADOOP-15281: - It's in 3.1 via HADOOP-16096. I'm actually going to leave it at that point for branch-3, but pull it into branch-2, at least as far as a PoC patch. Why so? I have a repackaged version of distcp for branch-2 designed to upload to object stores faster by having the relevant changes (improved delete, for example). Ultimately, distcp needs replacement. For now, we can tweak the details, carefully > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766158#comment-16766158 ] Andrew Olson commented on HADOOP-15281: --- Updated fix versions. If there is a compelling reason to patch this into 3.1 I can make the necessary logging changes. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763045#comment-16763045 ] Eric Payne commented on HADOOP-15281: - I'm going to revert this from 3.1 so that further development can continue. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763038#comment-16763038 ] Masatake Iwasaki commented on HADOOP-15281: --- HADOOP-15552 is not in 3.1. Backporting looks not so easy. Fixing the relevant code in RetriableFileCopyCommand to use commons-logging way would be quick fix. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762989#comment-16762989 ] Andrew Olson commented on HADOOP-15281: --- [~eepayne] Because SLF4J was introduced post-3.1? > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762957#comment-16762957 ] Eric Payne commented on HADOOP-15281: - [~noslowerdna] and [~ste...@apache.org], this commit breaks the branch-3.1 build when building with java 1.8. {code:title="RetriableFileCopyCommand.java"} LOG.info("Copying {} to {}", source.getPath(), target); {code} Multiple lines have this error: {panel:title="Build Failure"} [ERROR] hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8] no suitable method found for info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path) [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is not applicable [ERROR] (actual and formal argument lists differ in length) [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is not applicable [ERROR] (actual and formal argument lists differ in length) {panel} > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762757#comment-16762757 ] Andrew Olson commented on HADOOP-15281: --- Thanks for accepting the patch. We don't have a concrete need for it in 2.x at this time, but I think the code changes should merge in mostly cleanly. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762545#comment-16762545 ] Hudson commented on HADOOP-15281: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15905 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15905/]) HADOOP-15281. Distcp to add no-rename copy option. (stevel: rev de804e53b9d20a2df75a4c7252bf83ed52011488) * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/contract/AbstractContractDistCpTest.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractDistCp.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpContext.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762035#comment-16762035 ] Andrew Olson commented on HADOOP-15281: --- [~ste...@apache.org] Please use 'andrew.ol...@cerner.com' > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762025#comment-16762025 ] Steve Loughran commented on HADOOP-15281: - LGTM: tested the s3a and ABFS distcp. +1 Before I commit this: What email address can I use for the --author tag? I want to make sure github gives you credit for your work, and git blame finds both of us when it doesnt > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761732#comment-16761732 ] Steve Loughran commented on HADOOP-15281: - LGTM. Running the s3 and abfs tests to make sure all is well; if they are it'll get my vote > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761259#comment-16761259 ] Hadoop QA commented on HADOOP-15281: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} hadoop-tools: The patch generated 0 new + 68 unchanged - 2 fixed = 68 total (was 70) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 52s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 43s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HADOOP-15281 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12957693/HADOOP-15281-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0a3292439dd8 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fa8cd1b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results |
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761199#comment-16761199 ] Andrew Olson commented on HADOOP-15281: --- Attached updated patch addressing the review feedback. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761032#comment-16761032 ] Andrew Olson commented on HADOOP-15281: --- Thanks [~ste...@apache.org], I'll get those suggested changes made shortly. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760999#comment-16760999 ] Steve Loughran commented on HADOOP-15281: - right, really good piece fo work, nearly ready to go in. I like the tests in particular..nicely done, including the s3a extension to verify that the #of renames is zero Some minor change in RetriableFileCopyCommand.doCopy() covered below. Before that, How about the option is called "direct"? keeps it easier to remember. Code can stay as is, but the CLI option & docs can be changed. I just like a {{-direct -update}} as a set of commands (I never get -skipCrcCheck right; that's precisely what I don't want to replicate -people mistakenly going -directWrite. A short "direct" avoids case confusion h3. RetriableFileCopyCommand.doCopy() L122 rename useTmpTarget to useTempTarget L125 (and all other new log statements) We're using the SLF4J API, so can use LOG.info("Writing to {} target file path {}, useTempTarget ? "temporary" ? "direct", targetPath) This also means you can skip the LOG.isDebugEnabled() as the string eval & concat is only done if there's a log event. (Don't bother with the existing logs, just worry about those which you are adding/changing) L172: no need to call targetFS.exists(targetPath) before the delete; delete does the checks and is a no-op if the destination doesn't exist. Saves many HTTP requests against a store. One thing I wondered about is "could we actually log the duration of operations, e.g. the rename()". But the lib to do that (DurationInfo) is in hadoop-aws; I think moving that would be good, but it's a separate bit of work (HADOOP-16093). So don't worry about it. That said, CopyCommitter.deleteMissing does have some variant of that code: distcp would be the obvious place to add a chunk of this. I don't want to have that block this patch, or complicate backporting (I will cherrypick this to branch-2, but not -2.8). Accordingly: # fix those bits of RetriableFileCopyCommand # tell me what you think of having the option "direct", and if you are happy with it, do that change. thanks > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760095#comment-16760095 ] Andrew Olson commented on HADOOP-15281: --- [~ste...@apache.org] Ok, thank you. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760083#comment-16760083 ] Steve Loughran commented on HADOOP-15281: - Andrew, I know of this, I'm just on a real backlog of reviews and I'm trying to catch up this week. This is an important one and I do want to take a close look @ it. And as distcp is a large piece of code which I'm scared of, it'll take time > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752319#comment-16752319 ] Andrew Olson commented on HADOOP-15281: --- [~ste...@apache.org] Got a +1 from Jenkins on this. Let me know if any code changes need to be made. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751761#comment-16751761 ] Hadoop QA commented on HADOOP-15281: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} hadoop-tools: The patch generated 0 new + 68 unchanged - 2 fixed = 68 total (was 70) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 49s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 26s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HADOOP-15281 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12956219/HADOOP-15281-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 30bd5cfdc894 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4e0aa2c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/15843/testReport/ | |
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751669#comment-16751669 ] Hadoop QA commented on HADOOP-15281: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-tools: The patch generated 10 new + 69 unchanged - 1 fixed = 79 total (was 70) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 35s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 27s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HADOOP-15281 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12956200/HADOOP-15281-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a99661bf0d47 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4e0aa2c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751494#comment-16751494 ] Andrew Olson commented on HADOOP-15281: --- I will fix the test failure. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751490#comment-16751490 ] Hadoop QA commented on HADOOP-15281: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-tools: The patch generated 10 new + 69 unchanged - 1 fixed = 79 total (was 70) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 31s{color} | {color:red} hadoop-distcp in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 32s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestDistCpOptions | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HADOOP-15281 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12956179/HADOOP-15281-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4cd00da04e47 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3c7d700 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | |
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751403#comment-16751403 ] Steve Loughran commented on HADOOP-15281: - I see it. Hit the "submit patch" button and jenkins will build it > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751375#comment-16751375 ] Andrew Olson commented on HADOOP-15281: --- [~ste...@apache.org] thanks, I've attached the patch. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Attachments: HADOOP-15281-001.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751318#comment-16751318 ] Steve Loughran commented on HADOOP-15281: - try now > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751310#comment-16751310 ] Steve Loughran commented on HADOOP-15281: - oh, let me give you the permission; restrictions are there to keep out spam not code > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749237#comment-16749237 ] Andrew Olson commented on HADOOP-15281: --- [~ste...@apache.org] I don't seem to have permission to attach a patch to this issue. Here's a pull request: https://github.com/apache/hadoop/pull/469 > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749222#comment-16749222 ] Andrew Olson commented on HADOOP-15281: --- Attaching patch file > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742433#comment-16742433 ] Andrew Olson commented on HADOOP-15281: --- Ok, sounds good from here - thanks for the input. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742359#comment-16742359 ] Steve Loughran commented on HADOOP-15281: - I think I'd prefer it as an option (painful as it is to pass down), rather than something trying be clever with the schema of the dest FS. * supports many other object stores where rename may be mimicked expensively (even the Azure ones can be slow sometimes) * easy to test on the local FS * lets people who bind "s3://" to the S3A implementation to use this (its done as a way to move off EMR) * avoids having some special S3A-only secret hidden in distcp Sorry, I know this is harder work, but its for the best > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742311#comment-16742311 ] Andrew Olson commented on HADOOP-15281: --- We've successfully tested a quick+dirty solution where in the RetriableFileCopyCommand class, renames are specifically avoided for any S3A target destinations in {{doCopy}} - I'm not sure there are cases where it's actually beneficial or desirable when writing to S3. The gist of the logic is simply, {noformat} final boolean toAppend = action == FileAction.APPEND; final boolean directWriteToS3 = target.toUri().getScheme().equals("s3a"); final boolean useTmpTarget = !toAppend && !directWriteToS3; Path targetPath = useTmpTarget ? getTmpFile(target, context) : target; {noformat} (Then replacing a couple instances of {{!toAppend}} with {{useTmpTarget}} later on in the method) If that's a viable & reasonable solution I can work on a patch with that change and the corresponding test/docs updates. Otherwise design guidance would be appreciated for how and where to add the write-directly no-rename configuration option. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654178#comment-16654178 ] Dinesh Chitlangia commented on HADOOP-15281: [~ste...@apache.org] {quote}for a many-GB rename, the rename can take so long that the worker stops heartbeating; AM fails it, etc. etc. {quote} This is spot on. I encountered a production cluster where distcp with file sizes greater than 250G have been encountering failures. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627186#comment-16627186 ] Steve Loughran commented on HADOOP-15281: - +related issue: for a many-GB rename, the rename can take so long that the worker stops heartbeating; AM fails it, etc. etc. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533557#comment-16533557 ] Steve Loughran commented on HADOOP-15281: - I'm not working on this; will review anyone who provides the patch. That's a patch which will need to have * new distcp option * test in the Distcp contract test which the object stores all subclass * line or two in the docs > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Priority: Major > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write direct to the dest path. either a conf option > or a CLI option -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383909#comment-16383909 ] Steve Loughran commented on HADOOP-15281: - debug level log of a distcp of one file to s3a. 20 metadata requests get logged per file. {code} 16:51:20,555 INFO mapred.CopyMapper (CopyMapper.java:map(154)) - Copying file:/home/s/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java to s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java 16:51:20,555 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - op_get_file_status += 1 -> 8987 16:51:20,555 DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(2098)) - Getting path status for s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java (distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java) 16:51:20,555 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 1 -> 19643 16:51:20,591 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 1 -> 19644 16:51:20,626 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - object_list_requests += 1 -> 7689 16:51:20,664 DEBUG s3a.S3AFileSystem (S3AFileSystem.java:s3GetFileStatus(2245)) - Not Found: s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java 16:51:20,664 INFO mapred.RetriableFileCopyCommand (RetriableFileCopyCommand.java:getTmpFile(245)) - Creating temp file: s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0 16:51:20,664 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - op_create += 1 -> 851 16:51:20,664 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - op_get_file_status += 1 -> 8988 16:51:20,664 DEBUG s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(2098)) - Getting path status for s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0 (distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0) 16:51:20,664 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 1 -> 19645 16:51:20,697 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 1 -> 19646 16:51:20,732 DEBUG s3a.S3AStorageStatistics (S3AStorageStatistics.java:incrementCounter(63)) - object_list_requests += 1 -> 7690 16:51:20,770 DEBUG s3a.S3AFileSystem (S3AFileSystem.java:s3GetFileStatus(2245)) - Not Found: s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0 16:51:20,770 DEBUG s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:(169)) - Initialized S3ABlockOutputStream for distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0 output to FileBlock{index=1, destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, state=Writing, dataSize=0, limit=8388608} 16:51:20,772 DEBUG s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:close(349)) - S3ABlockOutputStream{WriteOperationHelper {bucket=hwdev-steve-ireland-new}, blockSize=8388608, activeBlock=FileBlock{index=1, destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, state=Writing, dataSize=16963, limit=8388608}}: Closing block #1: current block= FileBlock{index=1, destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, state=Writing, dataSize=16963, limit=8388608} 16:51:20,772 DEBUG s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:putObject(413)) - Executing regular upload for WriteOperationHelper {bucket=hwdev-steve-ireland-new} 16:51:20,772 DEBUG s3a.S3ADataBlocks (S3ADataBlocks.java:startUpload(324)) - Start datablock[1] upload 16:51:20,772 DEBUG s3a.S3ADataBlocks (S3ADataBlocks.java:enterState(231)) - FileBlock{index=1, destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, state=Writing, dataSize=16963, limit=8388608}: entering state Upload 16:51:20,772 DEBUG s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:clearActiveBlock(216)) - Clearing active block 16:51:20,772 [s3a-transfer-shared-pool1-t1] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:putObjectDirect(1520)) - PUT 16963 bytes to distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0 16:51:20,772 [s3a-transfer-shared-pool1-t1] DEBUG