[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325456#comment-16325456 ] genericqa commented on HADOOP-14999: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 34s{color} | {color:red} hadoop-tools/hadoop-aliyun generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s{color} | {color:green} hadoop-aliyun in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-tools/hadoop-aliyun | | | Should org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore$AscendPartNumber be a _static_ inner class? At AliyunOSSFileSystemStore.java:inner class? At AliyunOSSFileSystemStore.java:[lines 665-671] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-14999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12906017/HADOOP-14999.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2b3c4f0faf43 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7016dd4 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HADOOP-Build/13970/artifact/out/new-findbugs-hadoop-tools_hadoop-aliyun.html | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13970/testReport/ | | Max. process+thread count
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will improve performance greatly. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk > before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and this > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and this was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and this -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.003.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: asynchronous_file_uploading.pdf > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch
[ https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325156#comment-16325156 ] genericqa commented on HADOOP-15079: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 4s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 new + 7 unchanged - 1 fixed = 8 total (was 8) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 31s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15079 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12905995/HADOOP-15079-003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f035d7c5a2c9 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9afb802 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13968/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13968/testReport/ | | Max. process+thread count | 324 (vs. ulimit of 5000) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13968/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message
[jira] [Updated] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch
[ https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15079: Status: Patch Available (was: Open) > ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after > OutputCommitter patch > --- > > Key: HADOOP-15079 > URL: https://issues.apache.org/jira/browse/HADOOP-15079 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch, > HADOOP-15079-003.patch > > > I see this test failing with "object_delete_requests expected:<1> but > was:<2>". I printed stack traces whenever this metric was incremented, and > found the root cause to be that innerMkdirs is now causing two calls to > delete fake directories when it previously caused only one. It is called once > inside createFakeDirectory, and once directly inside innerMkdirs later: > {code} > at > org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684) > at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108) > at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74 > {code} > {code} > at > org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279) > at >
[jira] [Updated] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch
[ https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15079: Attachment: HADOOP-15079-003.patch HADOOP-15079 patch 003; fix up checkstyle warnings: identation and line length If checkstyle is happy, this is what I'll commit > ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after > OutputCommitter patch > --- > > Key: HADOOP-15079 > URL: https://issues.apache.org/jira/browse/HADOOP-15079 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch, > HADOOP-15079-003.patch > > > I see this test failing with "object_delete_requests expected:<1> but > was:<2>". I printed stack traces whenever this metric was incremented, and > found the root cause to be that innerMkdirs is now causing two calls to > delete fake directories when it previously caused only one. It is called once > inside createFakeDirectory, and once directly inside innerMkdirs later: > {code} > at > org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684) > at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108) > at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74 > {code} > {code} > at > org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369) > at >
[jira] [Updated] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch
[ https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15079: Status: Open (was: Patch Available) > ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after > OutputCommitter patch > --- > > Key: HADOOP-15079 > URL: https://issues.apache.org/jira/browse/HADOOP-15079 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.1.0 >Reporter: Sean Mackrory >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch > > > I see this test failing with "object_delete_requests expected:<1> but > was:<2>". I printed stack traces whenever this metric was incremented, and > found the root cause to be that innerMkdirs is now causing two calls to > delete fake directories when it previously caused only one. It is called once > inside createFakeDirectory, and once directly inside innerMkdirs later: > {code} > at > org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684) > at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108) > at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255) > at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74 > {code} > {code} > at > org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313) > at > org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279) > at >
[jira] [Commented] (HADOOP-14927) ITestS3GuardTool failures in testDestroyNoBucket()
[ https://issues.apache.org/jira/browse/HADOOP-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325139#comment-16325139 ] Steve Loughran commented on HADOOP-14927: - I haven't seen this BTW. Maybe its a race condition in the test...we have something similar related to changing bucket capacity > ITestS3GuardTool failures in testDestroyNoBucket() > -- > > Key: HADOOP-14927 > URL: https://issues.apache.org/jira/browse/HADOOP-14927 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1, 3.0.0-alpha3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > > Hit this when testing for the Hadoop 3.0.0-beta1 RC0. > {noformat} > hadoop-3.0.0-beta1-src/hadoop-tools/hadoop-aws$ mvn clean verify > -Dit.test="ITestS3GuardTool*" -Dtest=none -Ds3guard -Ddynamo > ... > Failed tests: > > ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 > Expected an exception, got 0 > ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestroyNoBucket:228 > Expected an exception, got 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions
[ https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325113#comment-16325113 ] Steve Loughran commented on HADOOP-13761: - Aaron: you got time to look @ this? > S3Guard: implement retries for DDB failures and throttling; translate > exceptions > > > Key: HADOOP-13761 > URL: https://issues.apache.org/jira/browse/HADOOP-13761 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Steve Loughran >Priority: Blocker > > Following the S3AFileSystem integration patch in HADOOP-13651, we need to add > retry logic. > In HADOOP-13651, I added TODO comments in most of the places retry loops are > needed, including: > - open(path). If MetadataStore reflects recent create/move of file path, but > we fail to read it from S3, retry. > - delete(path). If deleteObject() on S3 fails, but MetadataStore shows the > file exists, retry. > - rename(src,dest). If source path is not visible in S3 yet, retry. > - listFiles(). Skip for now. Not currently implemented in S3Guard. I will > create a separate JIRA for this as it will likely require interface changes > (i.e. prefix or subtree scan). > We may miss some cases initially and we should do failure injection testing > to make sure we're covered. Failure injection tests can be a separate JIRA > to make this easier to review. > We also need basic configuration parameters around retry policy. There > should be a way to specify maximum retry duration, as some applications would > prefer to receive an error eventually, than waiting indefinitely. We should > also be keeping statistics when inconsistency is detected and we enter a > retry loop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13884) s3a create(overwrite=true) to only look for dir/ and list entries, not file
[ https://issues.apache.org/jira/browse/HADOOP-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13884: Description: before doing a create(), s3a does a getFileStatus() to make sure there isn't a directory there, and, if overwrite=false, that there isn't a file. Because S3 caches negative HEAD/GET requests, if there isn't a file, then even after the PUT, a later GET/HEAD may return 404; we are generating create consistency where none need exist. when overwrite=true we don't care whether the file exists or not, only that the path isn't a directory. So we can just do the HEAD path +"/' and the LIST calls, skipping the {{HEAD path}}. This will save an HTTP round trip of a few hundred millis, and ensure that there's no 404 cached in the S3 front end for later callers was: before doing a create(), s3a does a getFileStatus() to make sure there isn't a directory there, and, if overwrite=false, that there isn't a file. Because S3 caches negative HEAD/GET requests, if there isn't a file, then even after the PUT, a later GET/HEAD may return 404; we are generating create consistency where none need exist. when overwrite=true we don't care whether the file exists or not, only that the path isn't a directory. So we can just to the HEAD path +"/' and the LIST calls, skipping the {{HEAD path}}. This will save an HTTP round trip of a few hundred millis, and ensure that there's no 404 cached in the S3 front end for later callers > s3a create(overwrite=true) to only look for dir/ and list entries, not file > --- > > Key: HADOOP-13884 > URL: https://issues.apache.org/jira/browse/HADOOP-13884 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Minor > > before doing a create(), s3a does a getFileStatus() to make sure there isn't > a directory there, and, if overwrite=false, that there isn't a file. > Because S3 caches negative HEAD/GET requests, if there isn't a file, then > even after the PUT, a later GET/HEAD may return 404; we are generating create > consistency where none need exist. > when overwrite=true we don't care whether the file exists or not, only that > the path isn't a directory. So we can just do the HEAD path +"/' and the LIST > calls, skipping the {{HEAD path}}. This will save an HTTP round trip of a few > hundred millis, and ensure that there's no 404 cached in the S3 front end for > later callers -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15151) MapFile.fix creates a wrong index file in case of block-compressed data file.
[ https://issues.apache.org/jira/browse/HADOOP-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325076#comment-16325076 ] Grigori Rybkine commented on HADOOP-15151: -- These failures do not appear to have anything to do with the patch. Also, in my local test, the failed unit test passes: {noformat} [INFO] Running org.apache.hadoop.security.TestRaceWhenRelogin [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.011 s - in org.apache.hadoop.security.TestRaceWhenRelogin {noformat} > MapFile.fix creates a wrong index file in case of block-compressed data file. > - > > Key: HADOOP-15151 > URL: https://issues.apache.org/jira/browse/HADOOP-15151 > Project: Hadoop Common > Issue Type: Bug > Components: common >Reporter: Grigori Rybkine > Labels: patch > Attachments: HADOOP-15151.001.patch, HADOOP-15151.002.patch, > HADOOP-15151.003.patch, HADOOP-15151.004.patch > > > Index file created with MapFile.fix for an ordered block-compressed data file > does not allow to find values for keys existing in the data file via the > MapFile.get method. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org