[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-07-09 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881254#comment-16881254
 ] 

Andrew Olson commented on HADOOP-15281:
---

Attached a corrected branch-2 patch file.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch, 
> HADOOP-15281-branch-2-001.patch, HADOOP-15281-branch-2-002.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write = true) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-04-17 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820197#comment-16820197
 ] 

Arun Suresh commented on HADOOP-15281:
--

Thanks for posting the patch for branch-2 [~ste...@apache.org]. Found a minor 
nit while testing though:
{{OptionsParser:165}} we need *{{option.setDirectWrite(false);}}* to be 
*{{option.setDirectWrite(true);}}*

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch, 
> HADOOP-15281-branch-2-001.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write = true) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770147#comment-16770147
 ] 

Hadoop QA commented on HADOOP-15281:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
3s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
25s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
57s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} hadoop-tools: The patch generated 10 new + 131 
unchanged - 2 fixed = 141 total (was 133) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m  9s{color} 
| {color:red} hadoop-distcp in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
41s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 43m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestOptionsParser |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:da67579 |
| JIRA Issue | HADOOP-15281 |
| JIRA Patch URL | 

[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-12 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766363#comment-16766363
 ] 

Steve Loughran commented on HADOOP-15281:
-

It's in 3.1 via HADOOP-16096. 

I'm actually going to leave it at that point for branch-3, but pull it into 
branch-2, at least as far as a PoC patch. Why so? I have a repackaged version 
of distcp for branch-2 designed to upload to object stores faster by having the 
relevant changes (improved delete, for example). 

Ultimately, distcp needs replacement. For now, we can tweak the details, 
carefully

 

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-12 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766158#comment-16766158
 ] 

Andrew Olson commented on HADOOP-15281:
---

Updated fix versions. If there is a compelling reason to patch this into 3.1 I 
can make the necessary logging changes.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763045#comment-16763045
 ] 

Eric Payne commented on HADOOP-15281:
-

I'm going to revert this from 3.1 so that further development can continue.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763038#comment-16763038
 ] 

Masatake Iwasaki commented on HADOOP-15281:
---

HADOOP-15552 is not in 3.1. Backporting looks not so easy. Fixing the relevant 
code in RetriableFileCopyCommand to use commons-logging way would be quick fix.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762989#comment-16762989
 ] 

Andrew Olson commented on HADOOP-15281:
---

[~eepayne] Because SLF4J was introduced post-3.1?

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762957#comment-16762957
 ] 

Eric Payne commented on HADOOP-15281:
-

[~noslowerdna] and [~ste...@apache.org], this commit breaks the branch-3.1 
build when building with java 1.8.
{code:title="RetriableFileCopyCommand.java"}
LOG.info("Copying {} to {}", source.getPath(), target);
{code}
Multiple lines have this error:
{panel:title="Build Failure"}
[ERROR] 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8]
 no suitable method found for 
info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path)
[ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is not 
applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is 
not applicable
[ERROR]   (actual and formal argument lists differ in length)
{panel}

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762757#comment-16762757
 ] 

Andrew Olson commented on HADOOP-15281:
---

Thanks for accepting the patch. We don't have a concrete need for it in 2.x at 
this time, but I think the code changes should merge in mostly cleanly.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762545#comment-16762545
 ] 

Hudson commented on HADOOP-15281:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15905 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15905/])
HADOOP-15281. Distcp to add no-rename copy option. (stevel: rev 
de804e53b9d20a2df75a4c7252bf83ed52011488)
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/contract/AbstractContractDistCpTest.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractDistCp.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpContext.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java


> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-06 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762035#comment-16762035
 ] 

Andrew Olson commented on HADOOP-15281:
---

[~ste...@apache.org] Please use 'andrew.ol...@cerner.com'

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762025#comment-16762025
 ] 

Steve Loughran commented on HADOOP-15281:
-

LGTM: tested the s3a and ABFS distcp.

+1

Before I commit this: What email address can I use for the --author tag? I want 
to make sure github gives you credit for your work, and git blame finds both of 
us when it doesnt

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761732#comment-16761732
 ] 

Steve Loughran commented on HADOOP-15281:
-

LGTM. Running the s3 and abfs tests to make sure all is well; if they are it'll 
get my vote

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761259#comment-16761259
 ] 

Hadoop QA commented on HADOOP-15281:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} hadoop-tools: The patch generated 0 new + 68 
unchanged - 2 fixed = 68 total (was 70) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
52s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
43s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-15281 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957693/HADOOP-15281-004.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0a3292439dd8 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / fa8cd1b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 

[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-05 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761199#comment-16761199
 ] 

Andrew Olson commented on HADOOP-15281:
---

Attached updated patch addressing the review feedback.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-05 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761032#comment-16761032
 ] 

Andrew Olson commented on HADOOP-15281:
---

Thanks [~ste...@apache.org], I'll get those suggested changes made shortly.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-05 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760999#comment-16760999
 ] 

Steve Loughran commented on HADOOP-15281:
-

right, really good piece fo work, nearly ready to go in. 

I like the tests in particular..nicely done, including the s3a extension to 
verify that the #of renames is zero

Some minor change in RetriableFileCopyCommand.doCopy() covered below.

Before that, How about the option is called "direct"? keeps it easier to 
remember. Code can stay as is, but the CLI option & docs can be changed. 

I just like a {{-direct -update}} as a set of commands (I never get 
-skipCrcCheck right; that's precisely what I don't want to replicate -people 
mistakenly going -directWrite. A short "direct" avoids case confusion

h3. RetriableFileCopyCommand.doCopy()

L122 rename useTmpTarget to useTempTarget

L125 (and all other new log statements) We're using the SLF4J API, so can use 
LOG.info("Writing to {} target file path {}, 
   useTempTarget ? "temporary" ? "direct", targetPath)

This also means you can skip the LOG.isDebugEnabled() as the string eval & 
concat is only done if there's a log event. (Don't bother with the existing 
logs, just worry about those which you are adding/changing)

L172: no need to call targetFS.exists(targetPath) before the delete; delete 
does the checks and is a no-op if the destination doesn't exist. Saves many 
HTTP requests against a store.

One thing I wondered about is "could we actually log the duration of 
operations, e.g. the rename()". But the lib to do that (DurationInfo) is in 
hadoop-aws; I think moving that would be good, but it's a separate bit of work 
(HADOOP-16093). So don't worry about it. That said, CopyCommitter.deleteMissing 
does have some variant of that code: distcp would be the obvious place to add a 
chunk of this.

I don't want to have that block this patch, or complicate backporting (I will 
cherrypick this to branch-2, but not -2.8).

Accordingly:

# fix those bits of RetriableFileCopyCommand
# tell me what you think of having the option "direct", and if you are happy 
with it, do that change.

thanks


> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-04 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760095#comment-16760095
 ] 

Andrew Olson commented on HADOOP-15281:
---

[~ste...@apache.org] Ok, thank you.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-04 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760083#comment-16760083
 ] 

Steve Loughran commented on HADOOP-15281:
-

Andrew, I know of this, I'm just on a real backlog of reviews and I'm trying to 
catch up this week. This is an important one and I do want to take a close look 
@ it. And as distcp is a large piece of code which I'm scared of, it'll take 
time

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-25 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752319#comment-16752319
 ] 

Andrew Olson commented on HADOOP-15281:
---

[~ste...@apache.org] Got a +1 from Jenkins on this. Let me know if any code 
changes need to be made.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751761#comment-16751761
 ] 

Hadoop QA commented on HADOOP-15281:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-tools: The patch generated 0 new + 68 
unchanged - 2 fixed = 68 total (was 70) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
49s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
26s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-15281 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12956219/HADOOP-15281-003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 30bd5cfdc894 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4e0aa2c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15843/testReport/ |
| 

[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751669#comment-16751669
 ] 

Hadoop QA commented on HADOOP-15281:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-tools: The patch generated 10 new + 69 
unchanged - 1 fixed = 79 total (was 70) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
35s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
27s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-15281 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12956200/HADOOP-15281-002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a99661bf0d47 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4e0aa2c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751494#comment-16751494
 ] 

Andrew Olson commented on HADOOP-15281:
---

I will fix the test failure.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751490#comment-16751490
 ] 

Hadoop QA commented on HADOOP-15281:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-tools: The patch generated 10 new + 69 
unchanged - 1 fixed = 79 total (was 70) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 31s{color} 
| {color:red} hadoop-distcp in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
32s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestDistCpOptions |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-15281 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12956179/HADOOP-15281-001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4cd00da04e47 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3c7d700 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| 

[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751403#comment-16751403
 ] 

Steve Loughran commented on HADOOP-15281:
-

I see it. Hit the "submit patch" button and jenkins will build it

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751375#comment-16751375
 ] 

Andrew Olson commented on HADOOP-15281:
---

[~ste...@apache.org] thanks, I've attached the patch.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Attachments: HADOOP-15281-001.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751318#comment-16751318
 ] 

Steve Loughran commented on HADOOP-15281:
-

try now

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751310#comment-16751310
 ] 

Steve Loughran commented on HADOOP-15281:
-

oh, let me give you the permission; restrictions are there to keep out spam not 
code

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-22 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749237#comment-16749237
 ] 

Andrew Olson commented on HADOOP-15281:
---

[~ste...@apache.org] I don't seem to have permission to attach a patch to this 
issue. Here's a pull request: https://github.com/apache/hadoop/pull/469

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-22 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749222#comment-16749222
 ] 

Andrew Olson commented on HADOOP-15281:
---

Attaching patch file

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-14 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742433#comment-16742433
 ] 

Andrew Olson commented on HADOOP-15281:
---

Ok, sounds good from here - thanks for the input.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742359#comment-16742359
 ] 

Steve Loughran commented on HADOOP-15281:
-

I think I'd prefer it as an option (painful as it is to pass down), rather than 
something trying be clever with the schema of the dest FS. 

* supports many other object stores where rename may be mimicked expensively 
(even the Azure ones can be slow sometimes)
* easy to test on the local FS
* lets people who bind "s3://" to the S3A implementation to use this (its done 
as a way to move off EMR)
* avoids having some special S3A-only secret hidden in distcp

Sorry, I know this is harder work, but its for the best

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-01-14 Thread Andrew Olson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742311#comment-16742311
 ] 

Andrew Olson commented on HADOOP-15281:
---

We've successfully tested a quick+dirty solution where in the 
RetriableFileCopyCommand class, renames are specifically avoided for any S3A 
target destinations in {{doCopy}} - I'm not sure there are cases where it's 
actually beneficial or desirable when writing to S3. The gist of the logic is 
simply,

{noformat}
final boolean toAppend = action == FileAction.APPEND;
final boolean directWriteToS3 = 
target.toUri().getScheme().equals("s3a");
final boolean useTmpTarget = !toAppend && !directWriteToS3;
Path targetPath = useTmpTarget ? getTmpFile(target, context) : target;
{noformat}

(Then replacing a couple instances of {{!toAppend}} with {{useTmpTarget}} later 
on in the method)

If that's a viable & reasonable solution I can work on a patch with that change 
and the corresponding test/docs updates. Otherwise design guidance would be 
appreciated for how and where to add the write-directly no-rename configuration 
option.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2018-10-17 Thread Dinesh Chitlangia (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654178#comment-16654178
 ] 

Dinesh Chitlangia commented on HADOOP-15281:


[~ste...@apache.org]
{quote}for a many-GB rename, the rename can take so long that the worker stops 
heartbeating; AM fails it, etc. etc.
{quote}
This is spot on. I encountered a production cluster where distcp with file 
sizes greater than 250G have been encountering failures.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2018-09-25 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627186#comment-16627186
 ] 

Steve Loughran commented on HADOOP-15281:
-

+related issue: for a many-GB rename, the rename can take so long that the 
worker stops heartbeating; AM fails it, etc. etc.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2018-07-05 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533557#comment-16533557
 ] 

Steve Loughran commented on HADOOP-15281:
-

I'm not working on this; will review anyone who provides the patch.

That's a patch which will need to have
* new distcp option
* test in the Distcp contract test which the object stores all subclass
* line or two in the docs

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write direct to the dest path. either a conf option 
> or a CLI option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2018-03-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383909#comment-16383909
 ] 

Steve Loughran commented on HADOOP-15281:
-

debug level log of a distcp of one file to s3a. 20 metadata requests get logged 
per file.
{code}
16:51:20,555 INFO  mapred.CopyMapper (CopyMapper.java:map(154)) - Copying 
file:/home/s/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java
 to 
s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java
16:51:20,555 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - op_get_file_status += 1  ->  
8987
16:51:20,555 DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:innerGetFileStatus(2098)) - Getting path status for 
s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java
  
(distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java)
16:51:20,555 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 
1  ->  19643
16:51:20,591 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 
1  ->  19644
16:51:20,626 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - object_list_requests += 1  
->  7689
16:51:20,664 DEBUG s3a.S3AFileSystem (S3AFileSystem.java:s3GetFileStatus(2245)) 
- Not Found: 
s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/bloom/BloomFilterCommonTester.java
16:51:20,664 INFO  mapred.RetriableFileCopyCommand 
(RetriableFileCopyCommand.java:getTmpFile(245)) - Creating temp file: 
s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0
16:51:20,664 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - op_create += 1  ->  851
16:51:20,664 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - op_get_file_status += 1  ->  
8988
16:51:20,664 DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:innerGetFileStatus(2098)) - Getting path status for 
s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0
  
(distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0)
16:51:20,664 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 
1  ->  19645
16:51:20,697 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - object_metadata_requests += 
1  ->  19646
16:51:20,732 DEBUG s3a.S3AStorageStatistics 
(S3AStorageStatistics.java:incrementCounter(63)) - object_list_requests += 1  
->  7690
16:51:20,770 DEBUG s3a.S3AFileSystem (S3AFileSystem.java:s3GetFileStatus(2245)) 
- Not Found: 
s3a://hwdev-steve-ireland-new/distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0
16:51:20,770 DEBUG s3a.S3ABlockOutputStream 
(S3ABlockOutputStream.java:(169)) - Initialized S3ABlockOutputStream for 
distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0 
output to FileBlock{index=1, 
destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, 
state=Writing, dataSize=0, limit=8388608}
16:51:20,772 DEBUG s3a.S3ABlockOutputStream 
(S3ABlockOutputStream.java:close(349)) - 
S3ABlockOutputStream{WriteOperationHelper {bucket=hwdev-steve-ireland-new}, 
blockSize=8388608, activeBlock=FileBlock{index=1, 
destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, 
state=Writing, dataSize=16963, limit=8388608}}: Closing block #1: current 
block= FileBlock{index=1, 
destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, 
state=Writing, dataSize=16963, limit=8388608}
16:51:20,772 DEBUG s3a.S3ABlockOutputStream 
(S3ABlockOutputStream.java:putObject(413)) - Executing regular upload for 
WriteOperationHelper {bucket=hwdev-steve-ireland-new}
16:51:20,772 DEBUG s3a.S3ADataBlocks (S3ADataBlocks.java:startUpload(324)) - 
Start datablock[1] upload
16:51:20,772 DEBUG s3a.S3ADataBlocks (S3ADataBlocks.java:enterState(231)) - 
FileBlock{index=1, 
destFile=/tmp/hadoop-stevel/s3a/s3ablock-0001-1407016756609004936.tmp, 
state=Writing, dataSize=16963, limit=8388608}: entering state Upload
16:51:20,772 DEBUG s3a.S3ABlockOutputStream 
(S3ABlockOutputStream.java:clearActiveBlock(216)) - Clearing active block
16:51:20,772 [s3a-transfer-shared-pool1-t1] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:putObjectDirect(1520)) - PUT 16963 bytes to 
distcp/hadoop-common-project/.distcp.tmp.attempt_local9668494_0001_m_00_0
16:51:20,772 [s3a-transfer-shared-pool1-t1] DEBUG