[jira] [Commented] (HADOOP-15038) Abstract MetadataStore in S3Guard into a common module.

2019-02-27 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780040#comment-16780040
 ] 

Genmao Yu commented on HADOOP-15038:


some minor comments:
1. I like "METASTORE_LOCAL_ENTRY_TTL_MS" better than "METASTORE_LOCAL_ENTRY_TTL"

Besides, to be fully rigorous here: which service endpoint have you tested the 
latest patch against?



> Abstract MetadataStore in S3Guard into a common module.
> ---
>
> Key: HADOOP-15038
> URL: https://issues.apache.org/jira/browse/HADOOP-15038
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2
>Reporter: Genmao Yu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15038.001.patch
>
>
> Open this JIRA to discuss if we should move {{MetadataStore}} in {{S3Guard}} 
> into a common module. 
> Based on this work, other filesystem or object store can implement their own 
> metastore for optimization (known issues like consistency problem and 
> metadata operation performance). [~ste...@apache.org] and other guys have 
> done many base and great works in {{S3Guard}}. It is very helpful to start 
> work. I did some perf test in HADOOP-14098, and started related work for 
> Aliyun OSS.  Indeed there are still works to do for {{S3Guard}}, like 
> metadata cache inconsistent with S3 and so on. It also will be a problem for 
> other object store. However, we can do these works in parallel.
> [~ste...@apache.org] [~fabbri] [~drankye] Any suggestion is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15038) Abstract MetadataStore in S3Guard into a common module.

2019-02-27 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780040#comment-16780040
 ] 

Genmao Yu edited comment on HADOOP-15038 at 2/28/19 3:04 AM:
-

[~wujinhu] some minor comments:
1. I like "METASTORE_LOCAL_ENTRY_TTL_MS" better than "METASTORE_LOCAL_ENTRY_TTL"

Besides, to be fully rigorous here: which service endpoint have you tested the 
latest patch against?




was (Author: unclegen):
some minor comments:
1. I like "METASTORE_LOCAL_ENTRY_TTL_MS" better than "METASTORE_LOCAL_ENTRY_TTL"

Besides, to be fully rigorous here: which service endpoint have you tested the 
latest patch against?



> Abstract MetadataStore in S3Guard into a common module.
> ---
>
> Key: HADOOP-15038
> URL: https://issues.apache.org/jira/browse/HADOOP-15038
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2
>Reporter: Genmao Yu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15038.001.patch
>
>
> Open this JIRA to discuss if we should move {{MetadataStore}} in {{S3Guard}} 
> into a common module. 
> Based on this work, other filesystem or object store can implement their own 
> metastore for optimization (known issues like consistency problem and 
> metadata operation performance). [~ste...@apache.org] and other guys have 
> done many base and great works in {{S3Guard}}. It is very helpful to start 
> work. I did some perf test in HADOOP-14098, and started related work for 
> Aliyun OSS.  Indeed there are still works to do for {{S3Guard}}, like 
> metadata cache inconsistent with S3 and so on. It also will be a problem for 
> other object store. However, we can do these works in parallel.
> [~ste...@apache.org] [~fabbri] [~drankye] Any suggestion is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15038) Abstract MetadataStore in S3Guard into a common module.

2019-02-27 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1678#comment-1678
 ] 

Genmao Yu commented on HADOOP-15038:


[~wujinhu] Could you please give a summary of your work, it will be very 
helpful for review. 

> Abstract MetadataStore in S3Guard into a common module.
> ---
>
> Key: HADOOP-15038
> URL: https://issues.apache.org/jira/browse/HADOOP-15038
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2
>Reporter: Genmao Yu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15038.001.patch
>
>
> Open this JIRA to discuss if we should move {{MetadataStore}} in {{S3Guard}} 
> into a common module. 
> Based on this work, other filesystem or object store can implement their own 
> metastore for optimization (known issues like consistency problem and 
> metadata operation performance). [~ste...@apache.org] and other guys have 
> done many base and great works in {{S3Guard}}. It is very helpful to start 
> work. I did some perf test in HADOOP-14098, and started related work for 
> Aliyun OSS.  Indeed there are still works to do for {{S3Guard}}, like 
> metadata cache inconsistent with S3 and so on. It also will be a problem for 
> other object store. However, we can do these works in parallel.
> [~ste...@apache.org] [~fabbri] [~drankye] Any suggestion is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15038) Abstract MetadataStore in S3Guard into a common module.

2019-02-26 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu reassigned HADOOP-15038:
--

Assignee: wujinhu

> Abstract MetadataStore in S3Guard into a common module.
> ---
>
> Key: HADOOP-15038
> URL: https://issues.apache.org/jira/browse/HADOOP-15038
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: wujinhu
>Priority: Major
>
> Open this JIRA to discuss if we should move {{MetadataStore}} in {{S3Guard}} 
> into a common module. 
> Based on this work, other filesystem or object store can implement their own 
> metastore for optimization (known issues like consistency problem and 
> metadata operation performance). [~ste...@apache.org] and other guys have 
> done many base and great works in {{S3Guard}}. It is very helpful to start 
> work. I did some perf test in HADOOP-14098, and started related work for 
> Aliyun OSS.  Indeed there are still works to do for {{S3Guard}}, like 
> metadata cache inconsistent with S3 and so on. It also will be a problem for 
> other object store. However, we can do these works in parallel.
> [~ste...@apache.org] [~fabbri] [~drankye] Any suggestion is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14994) AliyunOSS:Improve oss error logging

2018-11-05 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu resolved HADOOP-14994.

Resolution: Fixed

update oss sdk to 3.0.0 can close oss error logging

> AliyunOSS:Improve oss error logging
> ---
>
> Key: HADOOP-14994
> URL: https://issues.apache.org/jira/browse/HADOOP-14994
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15001) AliyunOSS: provide one memory-based buffering mechanism in outpustream to oss

2018-11-05 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu reassigned HADOOP-15001:
--

Assignee: Genmao Yu

> AliyunOSS: provide one memory-based buffering mechanism in outpustream to oss
> -
>
> Key: HADOOP-15001
> URL: https://issues.apache.org/jira/browse/HADOOP-15001
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15002) AliyunOSS: refactor storage statistics to oss

2018-11-05 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu reassigned HADOOP-15002:
--

Assignee: Genmao Yu

> AliyunOSS: refactor storage statistics to oss
> -
>
> Key: HADOOP-15002
> URL: https://issues.apache.org/jira/browse/HADOOP-15002
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential in URL

2018-11-05 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15158:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-13377

> AliyunOSS: Supports role based credential in URL
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, 
> HADOOP-15158.003.patch, HADOOP-15158.004.patch, HADOOP-15158.005.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2018-11-05 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15323:
---
Issue Type: Sub-task  (was: Task)
Parent: HADOOP-13377

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. So, we will improve copy file  
> for AliyunOSSFileSystemStore. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2018-11-05 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15323:
---
Issue Type: Task  (was: Improvement)

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/oss
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. So, we will improve copy file  
> for AliyunOSSFileSystemStore. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15671) AliyunOSS: Support Assume Roles in AliyunOSS

2018-08-15 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580793#comment-16580793
 ] 

Genmao Yu edited comment on HADOOP-15671 at 8/15/18 7:10 AM:
-

Thanks for the bug fix and improvement [~wujinhu]. LGTM. Left just some minor 
comments:
 # some indent issue
 # maybe you can give a more readable name. 
{code:java}
conf.set(ASSUMED_ROLE_SESSION_NAME, "qrqw ewqr");
{code}
                                                                  


was (Author: unclegen):
Thanks for the update [~wujinhu] LGTM. Left just some minor comments:
 # some indent issue
 # maybe you can give a more readable name. 
{code:java}
conf.set(ASSUMED_ROLE_SESSION_NAME, "qrqw ewqr");
{code}
                                                                  

> AliyunOSS: Support Assume Roles in AliyunOSS
> 
>
> Key: HADOOP-15671
> URL: https://issues.apache.org/jira/browse/HADOOP-15671
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15671.001.patch
>
>
> We will add assume role function in Aliyun OSS.
> For details about assume role and sts token, click the link below:
> [https://www.alibabacloud.com/help/doc-detail/31935.html?spm=a2c5t.11065259.1996646101.searchclickresult.1fad155aKOUvJZ]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15671) AliyunOSS: Support Assume Roles in AliyunOSS

2018-08-15 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580793#comment-16580793
 ] 

Genmao Yu commented on HADOOP-15671:


Thanks for the update [~wujinhu] LGTM. Left just some minor comments:
 # some indent issue
 # maybe you can give a more readable name. 
{code:java}
conf.set(ASSUMED_ROLE_SESSION_NAME, "qrqw ewqr");
{code}
                                                                  

> AliyunOSS: Support Assume Roles in AliyunOSS
> 
>
> Key: HADOOP-15671
> URL: https://issues.apache.org/jira/browse/HADOOP-15671
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.1.0, 2.10.0, 2.9.1, 3.2.0, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15671.001.patch
>
>
> We will add assume role function in Aliyun OSS.
> For details about assume role and sts token, click the link below:
> [https://www.alibabacloud.com/help/doc-detail/31935.html?spm=a2c5t.11065259.1996646101.searchclickresult.1fad155aKOUvJZ]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-16 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545952#comment-16545952
 ] 

Genmao Yu edited comment on HADOOP-15607 at 7/17/18 2:38 AM:
-

[~wujinhu] Overall LGTM,but maybe we should use one intermediate variable for 
blockId to pass to {{upload task}}, like:
{code:java}
  int bid = ++blockId;

  

  PartETag partETag = store.uploadPart(currentFile, key, uploadId, bid);

  ...

 
{code}


was (Author: unclegen):
[~wujinhu] Overall LGTM,but maybe we should use one intermediate variable for 
blockId to pass to {{upload task}}, like:
{code:java}
  int bid = ++block;

  

  PartETag partETag = store.uploadPart(currentFile, key, uploadId, bid);

  ...

 
{code}

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
> Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
> [ErrorCode]: InvalidPartOrder
> [RequestId]: 5B4C40425FCC208D79D1EAF5
> [HostId]: 100.103.0.137
> [ResponseError]:
> 
> 
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  100.103.0.137
>  current PartNumber 3, you given part number 3is not in 
> ascending order
> 
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-16 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545952#comment-16545952
 ] 

Genmao Yu edited comment on HADOOP-15607 at 7/17/18 2:38 AM:
-

[~wujinhu] Overall LGTM,but maybe we should use one intermediate variable for 
blockId to pass to {{upload task}}, like:
{code:java}
  int bid = ++block;

  

  PartETag partETag = store.uploadPart(currentFile, key, uploadId, bid);

  ...

 
{code}


was (Author: unclegen):
[~wujinhu] Overall LGTM,but maybe we should use one intermediate variable for 
blockId to pass to {{upload task}}, like:
{code:java}
  int bid = block++

  

  PartETag partETag = store.uploadPart(currentFile, key, uploadId, bid);

  ...

 
{code}

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
> Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
> [ErrorCode]: InvalidPartOrder
> [RequestId]: 5B4C40425FCC208D79D1EAF5
> [HostId]: 100.103.0.137
> [ResponseError]:
> 
> 
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  100.103.0.137
>  current PartNumber 3, you given part number 3is not in 
> ascending order
> 
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-16 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545952#comment-16545952
 ] 

Genmao Yu edited comment on HADOOP-15607 at 7/17/18 2:37 AM:
-

[~wujinhu] Overall LGTM,but maybe we should use one intermediate variable for 
blockId to pass to {{upload task}}, like:
{code:java}
  int bid = block++

  

  PartETag partETag = store.uploadPart(currentFile, key, uploadId, bid);

  ...

 
{code}


was (Author: unclegen):
[~wujinhu] Overall LGTM,but maybe we should use one intermediate variable for 
blockId to pass to upload task, like:
{code:java}
  int bid = block++

  

  PartETag partETag = store.uploadPart(currentFile, key, uploadId, bid);

  ...

 
{code}

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
> Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
> [ErrorCode]: InvalidPartOrder
> [RequestId]: 5B4C40425FCC208D79D1EAF5
> [HostId]: 100.103.0.137
> [ResponseError]:
> 
> 
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  100.103.0.137
>  current PartNumber 3, you given part number 3is not in 
> ascending order
> 
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HADOOP-15607) AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream

2018-07-16 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545952#comment-16545952
 ] 

Genmao Yu commented on HADOOP-15607:


[~wujinhu] Overall LGTM,but maybe we should use one intermediate variable for 
blockId to pass to upload task, like:
{code:java}
  int bid = block++

  

  PartETag partETag = store.uploadPart(currentFile, key, uploadId, bid);

  ...

 
{code}

> AliyunOSS: fix duplicated partNumber issue in AliyunOSSBlockOutputStream 
> -
>
> Key: HADOOP-15607
> URL: https://issues.apache.org/jira/browse/HADOOP-15607
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15607.001.patch
>
>
> When I generated data with hive-tpcds tool, I got exception below:
> 2018-07-16 14:50:43,680 INFO mapreduce.Job: Task Id : 
> attempt_1531723399698_0001_m_52_0, Status : FAILED
> Error: com.aliyun.oss.OSSException: The list of parts was not in ascending 
> order. Parts list must specified in order by part number.
> [ErrorCode]: InvalidPartOrder
> [RequestId]: 5B4C40425FCC208D79D1EAF5
> [HostId]: 100.103.0.137
> [ResponseError]:
> 
> 
>  InvalidPartOrder
>  The list of parts was not in ascending order. Parts list must 
> specified in order by part number.
>  5B4C40425FCC208D79D1EAF5
>  100.103.0.137
>  current PartNumber 3, you given part number 3is not in 
> ascending order
> 
> at 
> com.aliyun.oss.common.utils.ExceptionFactory.createOSSException(ExceptionFactory.java:99)
>  at 
> com.aliyun.oss.internal.OSSErrorResponseHandler.handle(OSSErrorResponseHandler.java:69)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.handleResponse(ServiceClient.java:248)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequestImpl(ServiceClient.java:130)
>  at 
> com.aliyun.oss.common.comm.ServiceClient.sendRequest(ServiceClient.java:68)
>  at com.aliyun.oss.internal.OSSOperation.send(OSSOperation.java:94)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:149)
>  at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113)
>  at 
> com.aliyun.oss.internal.OSSMultipartOperation.completeMultipartUpload(OSSMultipartOperation.java:185)
>  at com.aliyun.oss.OSSClient.completeMultipartUpload(OSSClient.java:790)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.completeMultipartUpload(AliyunOSSFileSystemStore.java:643)
>  at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSBlockOutputStream.close(AliyunOSSBlockOutputStream.java:120)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>  at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>  at 
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)
>  at org.notmysock.tpcds.GenTable$DSDGen.cleanup(GenTable.java:169)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  
> I reviewed code below, 
> {code:java}
> blockId {code}
> has thread synchronization problem
> {code:java}
> // code placeholder
> private void uploadCurrentPart() throws IOException {
>   blockFiles.add(blockFile);
>   blockStream.flush();
>   blockStream.close();
>   if (blockId == 0) {
> uploadId = store.getUploadId(key);
>   }
>   ListenableFuture partETagFuture =
>   executorService.submit(() -> {
> PartETag partETag = store.uploadPart(blockFile, key, uploadId,
> blockId + 1);
> return partETag;
>   });
>   partETagsFutures.add(partETagFuture);
>   blockFile = newBlockFile();
>   blockId++;
>   blockStream = new BufferedOutputStream(new FileOutputStream(blockFile));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-04-07 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999-branch-2.002.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999-branch-2.001.patch, 
> HADOOP-14999-branch-2.002.patch, HADOOP-14999.001.patch, 
> HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-04-03 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999-branch-2.001.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, 
> HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-04-03 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: (was: HADOOP-14999-branch-2.001.patch)

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, 
> HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-04-03 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423921#comment-16423921
 ] 

Genmao Yu commented on HADOOP-14999:


[~Sammi]

attach HADOOP-14999-branch-2.001.patch for branch-2

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, 
> HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-04-03 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999-branch-2.001.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999-branch-2.001.patch, HADOOP-14999.001.patch, 
> HADOOP-14999.002.patch, HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> HADOOP-14999.005.patch, HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> HADOOP-14999.008.patch, HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> HADOOP-14999.011.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-30 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421136#comment-16421136
 ] 

Genmao Yu commented on HADOOP-14999:


[~Sammi] OK, as soon as possible.

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, HADOOP-14999.011.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-30 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420254#comment-16420254
 ] 

Genmao Yu commented on HADOOP-14999:


[~Sammi] Thanks for your review, I have fixed your comments and some other same 
problems. Take a review again please.

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, HADOOP-14999.011.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-30 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.011.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, HADOOP-14999.011.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-29 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418542#comment-16418542
 ] 

Genmao Yu commented on HADOOP-14999:


All the tests passed against "oss-cn-shanghai.aliyuncs.com

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-29 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.010.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, HADOOP-14999.010.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-29 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418505#comment-16418505
 ] 

Genmao Yu edited comment on HADOOP-14999 at 3/29/18 6:40 AM:
-

[~Sammi]
 - #2:  *"根据不同的上传方式,对象的大小限制是不一样的。分片上传 最大支持 48.8TB 的对象大小,其他的上传方式最大支持 5GB。"* 
[https://help.aliyun.com/document_detail/31827.html]
 - #3: fixed
 - #4: fixed
 - #5: It is not an async operation, after submit tasks (store.uploadPart), we 
will wait until all part uploads finish. so it is safe to do do resource clean.


was (Author: unclegen):
[~Sammi]

- #2: {{根据不同的上传方式,对象的大小限制是不一样的。分片上传 最大支持 48.8TB 的对象大小,其他的上传方式最大支持 5GB。}}  
https://help.aliyun.com/document_detail/31827.html
- #3: fixed
- #4: fixed
- #5: It is not an async operation, after submit tasks (store.uploadPart), we 
will wait until all part uploads finish. so it is safe to do do resource clean.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-29 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418505#comment-16418505
 ] 

Genmao Yu commented on HADOOP-14999:


[~Sammi]

- #2: {{根据不同的上传方式,对象的大小限制是不一样的。分片上传 最大支持 48.8TB 的对象大小,其他的上传方式最大支持 5GB。}}  
https://help.aliyun.com/document_detail/31827.html
- #3: fixed
- #4: fixed
- #5: It is not an async operation, after submit tasks (store.uploadPart), we 
will wait until all part uploads finish. so it is safe to do do resource clean.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.009.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> HADOOP-14999.009.patch, asynchronous_file_uploading.pdf, 
> diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390910#comment-16390910
 ] 

Genmao Yu edited comment on HADOOP-14999 at 3/8/18 8:42 AM:


Thanks for [~Sammi] 's review. 
 1. comment-1: remove unused config and refine some config
 2. comment-2: fixed
 3. comment-3: Sorry, any problems? All style check passed.
{code:java}
Preconditions.checkArgument(v >= min,
String.format("Value of %s: %d is below the minimum value %d",
key, v, min));
{code}
4. comment-4: update unit test
 5. comment-5: IMHO, It is too large to test 5GB in integration test. And 
{{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned.
 6. "But they are not cleaned when exception happens during the write() 
process.": all temp files are {{deleteOnExit}}, but I also add the resource 
clean logic in {{try-finally}}

performance test: test file upload
|file size|before patch|after patch (with 4 parallelism)|
|10MB|1.03s|1.1s|
|100MB|6.5s|2.3s|
|1GB|56.5s|13.5s|
|10GB|574s|173s|


was (Author: unclegen):
Thanks for [~Sammi] 's review. 
1. comment-1: remove unused config and refine some config
2. comment-2: fixed
3. comment-3: Sorry, any problems?

{code}
Preconditions.checkArgument(v >= min,
String.format("Value of %s: %d is below the minimum value %d",
key, v, min));
{code} 
4. comment-4: update unit test
5. comment-5: IMHO, It is too large to test 5GB in integration test. And 
{{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned.
6. "But they are not cleaned when exception happens during the write() 
process.": all temp files are {{deleteOnExit}}, but I also add the resource 
clean logic in {{try-finally}}


performance test: test file upload

|file size|before patch|after patch (with 4 parallelism)|
|10MB|1.03s|1.1s|
|100MB|6.5s|2.3s|
|1GB|56.5s|13.5s|
|10GB|574s|173s|


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390912#comment-16390912
 ] 

Genmao Yu commented on HADOOP-14999:


All the tests passed against "oss-cn-shanghai.aliyuncs.com

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390911#comment-16390911
 ] 

Genmao Yu commented on HADOOP-14999:


diff-between-patch7-and-patch8.txt shows the detailed changes

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: diff-between-patch7-and-patch8.txt

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf, diff-between-patch7-and-patch8.txt
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390910#comment-16390910
 ] 

Genmao Yu commented on HADOOP-14999:


Thanks for [~Sammi] 's review. 
1. comment-1: remove unused config and refine some config
2. comment-2: fixed
3. comment-3: Sorry, any problems?

{code}
Preconditions.checkArgument(v >= min,
String.format("Value of %s: %d is below the minimum value %d",
key, v, min));
{code} 
4. comment-4: update unit test
5. comment-5: IMHO, It is too large to test 5GB in integration test. And 
{{MULTIPART_UPLOAD_SIZE}} may cover this case as you mentioned.
6. "But they are not cleaned when exception happens during the write() 
process.": all temp files are {{deleteOnExit}}, but I also add the resource 
clean logic in {{try-finally}}


performance test: test file upload

|file size|before patch|after patch (with 4 parallelism)|
|10MB|1.03s|1.1s|
|100MB|6.5s|2.3s|
|1GB|56.5s|13.5s|
|10GB|574s|173s|


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-03-08 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.008.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, HADOOP-14999.008.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-02-04 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352026#comment-16352026
 ] 

Genmao Yu commented on HADOOP-14999:


cc [~ste...@apache.org]

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-28 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342835#comment-16342835
 ] 

Genmao Yu commented on HADOOP-14999:


I ran all the tests against "oss-cn-shanghai.aliyuncs.com

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-28 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously:
 - improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
 - avoid buffering too large result into local disk. To cite an extreme 
example, there is a task which will output 100GB or even larger, we may need to 
output this 100GB to local disk and then upload it. Sometimes, it is 
inefficient and limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039.

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
 - if the output file is too large, it will run out of the local disk.
 - if the output file is too large, task will wait long time to upload result 
to OSS before finish, wasting much compute resource.

2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
{{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will 
improve performance greatly.

3. Each task will retry 3 times to upload block to Aliyun OSS. If one of those 
tasks failed, the whole file uploading will failed, and we will abort current 
uploading.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.

2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will 
improve performance greatly.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the 

[jira] [Updated] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2018-01-26 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Attachment: HADOOP-15039-branch-3.0.001.patch
HADOOP-15039-branch-2.9.001.patch
HADOOP-15039-branch-2.001.patch

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15039-branch-2.001.patch, 
> HADOOP-15039-branch-2.9.001.patch, HADOOP-15039-branch-3.0.001.patch, 
> HADOOP-15039.001.patch, HADOOP-15039.002.patch, HADOOP-15039.003.patch, 
> HADOOP-15039.004.patch, HADOOP-15039.005.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-25 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338989#comment-16338989
 ] 

Genmao Yu edited comment on HADOOP-15189 at 1/25/18 9:35 AM:
-

local unit tests of {{hadoop-aws}} passed for 3 patch


was (Author: unclegen):
local unit tests passed for 3 patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-25 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338989#comment-16338989
 ] 

Genmao Yu commented on HADOOP-15189:


local unit tests passed for 3 patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-24 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338785#comment-16338785
 ] 

Genmao Yu commented on HADOOP-14999:


cc [~Sammi], take a review please

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338783#comment-16338783
 ] 

Genmao Yu commented on HADOOP-15189:


cc [~Sammi]

I have upload 3 new patches to bachport HADOOP-15039 (Move 
SemaphoredDelegatingExecutor to hadoop-common) to branch-2, branch-2.9 and 
branch 3.0.

[^HADOOP-15189-branch-2.001.patch] and [^HADOOP-15189-branch-2.001.patch] 
pre-commit build passed

[^HADOOP-15189-branch-2.001.patch] failed with test failure, but look likes no 
problems.

Take a review please.

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338776#comment-16338776
 ] 

Genmao Yu commented on HADOOP-15189:


{code}

Results : Tests run: 3640, Failures: 0, Errors: 0, Skipped: 111 [INFO] 
 [INFO] 
BUILD FAILURE [INFO] 
 [INFO] 
Total time: 15:42 min [INFO] Finished at: 2018-01-25T03:11:38+00:00 [INFO] 
Final Memory: 34M/255M [INFO] 
 
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist. [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-common: There was a timeout or other error in the fork -> [Help 
1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with 
the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug 
logging. [ERROR] [ERROR] For more information about the errors and possible 
solutions, please read the following articles: [ERROR] [Help 1] 
[http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException]

{code}

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-24 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338613#comment-16338613
 ] 

Genmao Yu commented on HADOOP-14999:


patch.007 fix "findbugs" issue

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.007.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: HADOOP-15189-branch-2.001.patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: (was: HADOOP-15189-branch-2.001.patch)

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-24 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337667#comment-16337667
 ] 

Genmao Yu commented on HADOOP-14999:


resubmit patch.006 to trigger PreCommit build

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.006.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: (was: HADOOP-14999.006.patch)

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: HADOOP-15189-branch-3.0.001.patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch, HADOOP-15189-branch-3.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: HADOOP-15189-branch-2.9.001.patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: HADOOP-15189-branch-2.001.patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: (was: HADOOP-15189-branch-2.9.001.patch)

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: (was: HADOOP-15189-branch-2.001.patch)

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Status: Patch Available  (was: In Progress)

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: HADOOP-15189-branch-2.9.001.patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15189:
---
Attachment: HADOOP-15189-branch-2.001.patch

> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
> Attachments: HADOOP-15189-branch-2.001.patch, 
> HADOOP-15189-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work started] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15189 started by Genmao Yu.
--
> backport HADOOP-15039 to branch-2 and branch-3
> --
>
> Key: HADOOP-15189
> URL: https://issues.apache.org/jira/browse/HADOOP-15189
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15189) backport HADOOP-15039 to branch-2 and branch-3

2018-01-24 Thread Genmao Yu (JIRA)
Genmao Yu created HADOOP-15189:
--

 Summary: backport HADOOP-15039 to branch-2 and branch-3
 Key: HADOOP-15189
 URL: https://issues.apache.org/jira/browse/HADOOP-15189
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Genmao Yu
Assignee: Genmao Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-24 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.006.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2018-01-21 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333796#comment-16333796
 ] 

Genmao Yu commented on HADOOP-15039:


[~Sammi] got it

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch, HADOOP-15039.005.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-21 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333781#comment-16333781
 ] 

Genmao Yu commented on HADOOP-14999:


resubmit patch.005 to trigger once PreCommit build

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-21 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.005.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-21 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: (was: HADOOP-14999.005.patch)

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-16 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.005.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-16 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: (was: HADOOP-14999.005.patch)

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-15 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.005.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-15 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.004.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.

2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this will 
improve performance greatly.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two 

[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

1. {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

- {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
- {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two 

[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

- {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
before we can upload it to OSS. This will poses two problems:
- if the output file is too large, it will run out of the local disk.
- if the output file is too large, task will wait long time to upload 
result to OSS before finish, wasting much compute resource.
- {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
i.e. some small local file, and each block will be packaged into a uploading 
task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
{{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
will improve performance greatly.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> - {{AliyunOSSOutputStream}}: we need to output the whole result to local disk 
> before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> - {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and {{AliyunOSSBlockOutputStream}}, i.e. 
this asynchronous multi-part based uploading mechanism.

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and this 


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

Attached {{asynchronous_file_uploading.pdf}} illustrated the difference between 
previous {{AliyunOSSOutputStream}} and this 

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and this 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.003.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
depends on HADOOP-15039. 

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Description: 
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.

This patch reuse {{SemaphoredDelegatingExecutor}} as executor service

  was:
This mechanism is designed for uploading file in parallel and asynchronously: 

- improve the performance of uploading file to OSS server. Firstly, this 
mechanism splits result to multiple small blocks and upload them in parallel. 
Then, getting result and uploading blocks are asynchronous.
- avoid buffering too large result into local disk. To cite an extreme example, 
there is a task which will output 100GB or even larger, we may need to output 
this 100GB to local disk and then upload it. Sometimes, it is inefficient and 
limited to disk space.


> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-13 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: asynchronous_file_uploading.pdf

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14993) AliyunOSS: Override listFiles and listLocatedStatus

2017-12-14 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292001#comment-16292001
 ] 

Genmao Yu commented on HADOOP-14993:


@SammiChen attach a new patch at HDOOP-15111

> AliyunOSS: Override listFiles and listLocatedStatus 
> 
>
> Key: HADOOP-14993
> URL: https://issues.apache.org/jira/browse/HADOOP-14993
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 3.0.0, 3.1.0, 3.0.1
>
> Attachments: HADOOP-14993.001.patch, HADOOP-14993.002.patch, 
> HADOOP-14993.003.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14993) AliyunOSS: Override listFiles and listLocatedStatus

2017-12-14 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292001#comment-16292001
 ] 

Genmao Yu edited comment on HADOOP-14993 at 12/15/17 3:55 AM:
--

[~Sammi] attach a new patch at HADOOP-15111


was (Author: unclegen):
@SammiChen attach a new patch at HDOOP-15111

> AliyunOSS: Override listFiles and listLocatedStatus 
> 
>
> Key: HADOOP-14993
> URL: https://issues.apache.org/jira/browse/HADOOP-14993
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 3.0.0, 3.1.0, 3.0.1
>
> Attachments: HADOOP-14993.001.patch, HADOOP-14993.002.patch, 
> HADOOP-14993.003.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15111:
---
Attachment: (was: HADOOP-15111-branch-2.001.patch)

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.001.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15111:
---
Attachment: HADOOP-15111-branch-2.001.patch

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.001.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287610#comment-16287610
 ] 

Genmao Yu commented on HADOOP-15111:


attach 001 patch

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.001.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15111:
---
Attachment: HADOOP-15111-branch-2.001.patch

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.001.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15111:
---
Attachment: (was: HADOOP-15111-branch-2.000.patch)

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15111:
---
Issue Type: Improvement  (was: Bug)

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.000.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287570#comment-16287570
 ] 

Genmao Yu commented on HADOOP-15111:


All test passed against "oss-cn-shanghai.aliyuncs.com".

cc [~Sammi]

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.000.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15111:
---
Attachment: HADOOP-15111-branch-2.000.patch

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Genmao Yu
> Attachments: HADOOP-15111-branch-2.000.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15111:
---
Status: Patch Available  (was: Open)

> AliyunOSS: backport HADOOP-14993 to branch-2
> 
>
> Key: HADOOP-15111
> URL: https://issues.apache.org/jira/browse/HADOOP-15111
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Reporter: Genmao Yu
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15111) AliyunOSS: backport HADOOP-14993 to branch-2

2017-12-12 Thread Genmao Yu (JIRA)
Genmao Yu created HADOOP-15111:
--

 Summary: AliyunOSS: backport HADOOP-14993 to branch-2
 Key: HADOOP-15111
 URL: https://issues.apache.org/jira/browse/HADOOP-15111
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/oss
Reporter: Genmao Yu


Do a bulk listing off all entries under a path in one single operation, there 
is no need to recursively walk the directory tree.

Updates:

- override listFiles and listLocatedStatus by using bulk listing
- some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14993) AliyunOSS: Override listFiles and listLocatedStatus

2017-12-11 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286948#comment-16286948
 ] 

Genmao Yu commented on HADOOP-14993:


[~Sammi] OK, let me have a test.

> AliyunOSS: Override listFiles and listLocatedStatus 
> 
>
> Key: HADOOP-14993
> URL: https://issues.apache.org/jira/browse/HADOOP-14993
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 3.0.0, 3.1.0, 3.0.1
>
> Attachments: HADOOP-14993.001.patch, HADOOP-14993.002.patch, 
> HADOOP-14993.003.patch
>
>
> Do a bulk listing off all entries under a path in one single operation, there 
> is no need to recursively walk the directory tree.
> Updates:
> - override listFiles and listLocatedStatus by using bulk listing
> - some minor updates in hadoop-aliyun index.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2017-12-05 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279716#comment-16279716
 ] 

Genmao Yu commented on HADOOP-15027:


[~wujinhu] HADOOP-15039 merged, you can continue this work.

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2017-12-05 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279601#comment-16279601
 ] 

Genmao Yu commented on HADOOP-15039:


[~drankye]  Take a review again please! 

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch, HADOOP-15039.005.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2017-12-05 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Attachment: HADOOP-15039.005.patch

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch, HADOOP-15039.005.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2017-12-05 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Attachment: HADOOP-15039.004.patch

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2017-12-05 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Attachment: (was: HADOOP-15039.004.patch)

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2017-12-05 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Attachment: (was: HADOOP-15039.005.patch)

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2017-12-05 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Attachment: HADOOP-15039.005.patch

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch, HADOOP-15039.005.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15039) Move SemaphoredDelegatingExecutor to hadoop-common

2017-12-04 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Attachment: HADOOP-15039.004.patch

> Move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch, HADOOP-15039.004.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   >