[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HADOOP-14999: --- Fix Version/s: (was: 2.9.2) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Fix For: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3 > > Attachments: HADOOP-14999-branch-2.001.patch, > HADOOP-14999-branch-2.002.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HADOOP-14999: --- Resolution: Fixed Fix Version/s: 3.0.3 2.9.2 3.1.1 3.2.0 2.9.1 2.10.0 Status: Resolved (was: Patch Available) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Fix For: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14999-branch-2.001.patch, > HADOOP-14999-branch-2.002.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999-branch-2.002.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, > HADOOP-14999-branch-2.002.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999-branch-2.001.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: (was: HADOOP-14999-branch-2.001.patch) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999-branch-2.001.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, > HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, > HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, > HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, > HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.011.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > HADOOP-14999.009.patch, HADOOP-14999.010.patch, HADOOP-14999.011.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.010.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > HADOOP-14999.009.patch, HADOOP-14999.010.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.009.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > HADOOP-14999.009.patch, asynchronous_file_uploading.pdf, > diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: diff-between-patch7-and-patch8.txt > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.008.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload result > to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. > 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of > those tasks failed, the whole file uploading will failed, and we will abort > current uploading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will improve performance greatly. 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of those tasks failed, the whole file uploading will failed, and we will abort current uploading. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will improve performance greatly. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.007.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, HADOOP-14999.007.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.006.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: (was: HADOOP-14999.006.patch) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.006.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > HADOOP-14999.006.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.005.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: (was: HADOOP-14999.005.patch) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.005.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: (was: HADOOP-14999.005.patch) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.005.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.004.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, HADOOP-14999.004.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will improve performance greatly. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local > disk before we can upload it to OSS. This will poses two
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems: - if the output file is too large, it will run out of the local disk. - if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource. - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this will improve performance greatly. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. > - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk > before we can upload it to OSS. This will poses two problems: > - if the output file is too large, it will run out of the local disk. > - if the output file is too large, task will wait long time to upload > result to OSS before finish, wasting much compute resource. > - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, > i.e. some small local file, and each block will be packaged into a uploading > task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. > {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this > will improve performance greatly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based uploading mechanism. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and this > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and > {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based > uploading mechanism. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between previous {{AliyunOSSOutputStream}} and this was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. > Attached {{asynchronous_file_uploading.pdf}} illustrated the difference > between previous {{AliyunOSSOutputStream}} and this -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.003.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > HADOOP-14999.003.patch, asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and depends on HADOOP-15039. was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and > depends on HADOOP-15039. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. This patch reuse {{SemaphoredDelegatingExecutor}} as executor service was: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. > This patch reuse {{SemaphoredDelegatingExecutor}} as executor service -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: asynchronous_file_uploading.pdf > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, > asynchronous_file_uploading.pdf > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file in parallel and asynchronously: - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous. - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space. was:This mechanism is designed for uploading file > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch > > > This mechanism is designed for uploading file in parallel and asynchronously: > - improve the performance of uploading file to OSS server. Firstly, this > mechanism splits result to multiple small blocks and upload them in parallel. > Then, getting result and uploading blocks are asynchronous. > - avoid buffering too large result into local disk. To cite an extreme > example, there is a task which will output 100GB or even larger, we may need > to output this 100GB to local disk and then upload it. Sometimes, it is > inefficient and limited to disk space. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Description: This mechanism is designed for uploading file > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch > > > This mechanism is designed for uploading file -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.002.patch > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu > Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Affects Version/s: 3.0.0-beta1 Status: Patch Available (was: In Progress) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Attachment: HADOOP-14999.001.patch pending jenkins > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > Attachments: HADOOP-14999.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism
[ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-14999: --- Summary: AliyunOSS: provide one asynchronous multi-part based uploading mechanism (was: AliyunOSS: provide block-based output stream to support large file (> 5GB)) > AliyunOSS: provide one asynchronous multi-part based uploading mechanism > > > Key: HADOOP-14999 > URL: https://issues.apache.org/jira/browse/HADOOP-14999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Major > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org