[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325456#comment-16325456
 ] 

genericqa commented on HADOOP-14999:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-tools/hadoop-aliyun generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
17s{color} | {color:green} hadoop-aliyun in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-aliyun |
|  |  Should 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore$AscendPartNumber be a 
_static_ inner class?  At AliyunOSSFileSystemStore.java:inner class?  At 
AliyunOSSFileSystemStore.java:[lines 665-671] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-14999 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906017/HADOOP-14999.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2b3c4f0faf43 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7016dd4 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13970/artifact/out/new-findbugs-hadoop-tools_hadoop-aliyun.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13970/testReport/ |
| Max. process+thread count 

[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.

2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will 
improve performance greatly.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two 

[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

- {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
- {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two 

[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

- {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
- {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
> before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and this 


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and this 

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and this 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.003.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: asynchronous_file_uploading.pdf

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch

2018-01-13 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325156#comment-16325156
 ] 

genericqa commented on HADOOP-15079:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 
new + 7 unchanged - 1 fixed = 8 total (was 8) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
31s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15079 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12905995/HADOOP-15079-003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f035d7c5a2c9 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9afb802 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13968/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13968/testReport/ |
| Max. process+thread count | 324 (vs. ulimit of 5000) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13968/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message 

[jira] [Updated] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch

2018-01-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15079:

Status: Patch Available  (was: Open)

> ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after 
> OutputCommitter patch
> ---
>
> Key: HADOOP-15079
> URL: https://issues.apache.org/jira/browse/HADOOP-15079
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch, 
> HADOOP-15079-003.patch
>
>
> I see this test failing with "object_delete_requests expected:<1> but 
> was:<2>". I printed stack traces whenever this metric was incremented, and 
> found the root cause to be that innerMkdirs is now causing two calls to 
> delete fake directories when it previously caused only one. It is called once 
> inside createFakeDirectory, and once directly inside innerMkdirs later:
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108)
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
> at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74
> {code}
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at 
> 

[jira] [Updated] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch

2018-01-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15079:

Attachment: HADOOP-15079-003.patch

HADOOP-15079 patch 003; fix up checkstyle warnings: identation and line length

If checkstyle is happy, this is what I'll commit

> ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after 
> OutputCommitter patch
> ---
>
> Key: HADOOP-15079
> URL: https://issues.apache.org/jira/browse/HADOOP-15079
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch, 
> HADOOP-15079-003.patch
>
>
> I see this test failing with "object_delete_requests expected:<1> but 
> was:<2>". I printed stack traces whenever this metric was incremented, and 
> found the root cause to be that innerMkdirs is now causing two calls to 
> delete fake directories when it previously caused only one. It is called once 
> inside createFakeDirectory, and once directly inside innerMkdirs later:
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108)
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
> at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74
> {code}
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> 

[jira] [Updated] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch

2018-01-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15079:

Status: Open  (was: Patch Available)

> ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after 
> OutputCommitter patch
> ---
>
> Key: HADOOP-15079
> URL: https://issues.apache.org/jira/browse/HADOOP-15079
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch
>
>
> I see this test failing with "object_delete_requests expected:<1> but 
> was:<2>". I printed stack traces whenever this metric was incremented, and 
> found the root cause to be that innerMkdirs is now causing two calls to 
> delete fake directories when it previously caused only one. It is called once 
> inside createFakeDirectory, and once directly inside innerMkdirs later:
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108)
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
> at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74
> {code}
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at 
> 

[jira] [Commented] (HADOOP-14927) ITestS3GuardTool failures in testDestroyNoBucket()

2018-01-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325139#comment-16325139
 ] 

Steve Loughran commented on HADOOP-14927:
-

I haven't seen this BTW. Maybe its a race condition in the test...we have 
something similar related to changing bucket capacity

> ITestS3GuardTool failures in testDestroyNoBucket()
> --
>
> Key: HADOOP-14927
> URL: https://issues.apache.org/jira/browse/HADOOP-14927
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1, 3.0.0-alpha3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
>
> Hit this when testing for the Hadoop 3.0.0-beta1 RC0.
> {noformat}
> hadoop-3.0.0-beta1-src/hadoop-tools/hadoop-aws$ mvn clean verify 
> -Dit.test="ITestS3GuardTool*" -Dtest=none -Ds3guard -Ddynamo
> ...
> Failed tests: 
>   
> ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 
> Expected an exception, got 0
>   ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 
> Expected an exception, got 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-01-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325113#comment-16325113
 ] 

Steve Loughran commented on HADOOP-13761:
-

Aaron: you got time to look @ this?

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Steve Loughran
>Priority: Blocker
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13884) s3a create(overwrite=true) to only look for dir/ and list entries, not file

2018-01-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13884:

Description: 
before doing a create(), s3a does a getFileStatus() to make sure there isn't a 
directory there, and, if overwrite=false, that there isn't a file.

Because S3 caches negative HEAD/GET requests, if there isn't a file, then even 
after the PUT, a later GET/HEAD may return 404; we are generating create 
consistency where none need exist. 

when overwrite=true we don't care whether the file exists or not, only that the 
path isn't a directory. So we can just do the HEAD path +"/' and the LIST 
calls, skipping the {{HEAD path}}. This will save an HTTP round trip of a few 
hundred millis, and ensure that there's no 404 cached in the S3 front end for 
later callers

  was:
before doing a create(), s3a does a getFileStatus() to make sure there isn't a 
directory there, and, if overwrite=false, that there isn't a file.

Because S3 caches negative HEAD/GET requests, if there isn't a file, then even 
after the PUT, a later GET/HEAD may return 404; we are generating create 
consistency where none need exist. 

when overwrite=true we don't care whether the file exists or not, only that the 
path isn't a directory. So we can just to the HEAD path +"/' and the LIST 
calls, skipping the {{HEAD path}}. This will save an HTTP round trip of a few 
hundred millis, and ensure that there's no 404 cached in the S3 front end for 
later callers


> s3a create(overwrite=true) to only look for dir/ and list entries, not file
> ---
>
> Key: HADOOP-13884
> URL: https://issues.apache.org/jira/browse/HADOOP-13884
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Minor
>
> before doing a create(), s3a does a getFileStatus() to make sure there isn't 
> a directory there, and, if overwrite=false, that there isn't a file.
> Because S3 caches negative HEAD/GET requests, if there isn't a file, then 
> even after the PUT, a later GET/HEAD may return 404; we are generating create 
> consistency where none need exist. 
> when overwrite=true we don't care whether the file exists or not, only that 
> the path isn't a directory. So we can just do the HEAD path +"/' and the LIST 
> calls, skipping the {{HEAD path}}. This will save an HTTP round trip of a few 
> hundred millis, and ensure that there's no 404 cached in the S3 front end for 
> later callers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15151) MapFile.fix creates a wrong index file in case of block-compressed data file.

2018-01-13 Thread Grigori Rybkine (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325076#comment-16325076
 ] 

Grigori Rybkine commented on HADOOP-15151:
--

These failures do not appear to have anything to do with the patch. Also, in my 
local test, the failed unit test passes:
{noformat}
[INFO] Running org.apache.hadoop.security.TestRaceWhenRelogin
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.011 s 
- in org.apache.hadoop.security.TestRaceWhenRelogin
{noformat}

> MapFile.fix creates a wrong index file in case of block-compressed data file.
> -
>
> Key: HADOOP-15151
> URL: https://issues.apache.org/jira/browse/HADOOP-15151
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Grigori Rybkine
>  Labels: patch
> Attachments: HADOOP-15151.001.patch, HADOOP-15151.002.patch, 
> HADOOP-15151.003.patch, HADOOP-15151.004.patch
>
>
> Index file created with MapFile.fix for an ordered block-compressed data file 
> does not allow to find values for keys existing in the data file via the 
> MapFile.get method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org