[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889149#comment-16889149
 ] 

Hudson commented on HADOOP-13868:
-

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16957 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16957/])
HADOOP-13868. [s3a] New default for S3A multi-part configuration (#1125) 
(github: rev 7f1b76ca3598acb0a0a843b2364c8963c70edf4d)
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md


> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-18 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888395#comment-16888395
 ] 

Sean Mackrory commented on HADOOP-13868:


Bit-rot after only 2 1/2 years? Imagine that! Actually the only part that 
doesn't apply cleanly is the documentation, and that's just because it's 
looking 100 lines away from where it should. Resubmitted as a pull request to 
verify a clean Yetus run, but as the patch is virtually identical I'll assume 
your +1 still applies unless I hear otherwise.

> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888392#comment-16888392
 ] 

Hadoop QA commented on HADOOP-13868:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HADOOP-13868 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13868 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12842566/HADOOP-13868.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/16388/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888382#comment-16888382
 ] 

Steve Loughran commented on HADOOP-13868:
-

LGTM +1



> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2017-02-14 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866128#comment-15866128
 ] 

Sean Mackrory commented on HADOOP-13868:


Just pinging on this - I'd like to resolve it soon.

> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2016-12-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735690#comment-15735690
 ] 

Hadoop QA commented on HADOOP-13868:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
26s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13868 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12842566/HADOOP-13868.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux dfd8afad53d8 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 80b8023 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11236/testReport/ |
| modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . 
|
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11236/console |
| Powered by | Apache Yetus 

[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2016-12-09 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735529#comment-15735529
 ] 

Sean Mackrory commented on HADOOP-13868:


{quote}128MB seems a reasonable increase{quote}

Just to be clear, it's a decrease. I was mistaken about what the previous 
defaults were in trunk. But the current value is also significantly sub-optimal 
(at least in all the US regions I tested, despite significantly varying raw 
performance between them).

> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2016-12-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735019#comment-15735019
 ] 

Steve Loughran commented on HADOOP-13868:
-

128MB seems a reasonable increase. But could the patched values be of the form 
128M, rather than the multiplied out number. That way it's easier for people 
reading it to see what the actual number means

> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-13868.001.patch, optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org