from:"wujinhu \(JIRA\)"

[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249072#comment-16249072
 ] 

wujinhu edited comment on HADOOP-15027 at 11/13/17 3:29 AM:


Updates(HADOOP-15027.002.patch):
1. I have moved thread pool from InputStream to FileSystem.
2. disable pre-fetch in random IO.
Currently, I have tested sequential read & aggressive random read performance
oss.RandomSeek: file length 109002
oss.RandomSeek: sequential read used 14.964
oss.RandomSeek: random read used 61.353
When I tested aggressive random read,  I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

I am thinking to continue to improve random reads.


was (Author: wujinhu):
Updates:
1. I have moved thread pool from InputStream to FileSystem.
2. disable pre-fetch in random IO.
Currently, I have tested sequential read & aggressive random read performance
oss.RandomSeek: file length 109002
oss.RandomSeek: sequential read used 14.964
oss.RandomSeek: random read used 61.353
When I tested aggressive random read,  I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

I am thinking to continue to improve random reads.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249072#comment-16249072
 ] 

wujinhu commented on HADOOP-15027:
--

Updates:
1. I have moved thread pool from InputStream to FileSystem.
2. disable pre-fetch in random IO.
Currently, I have tested sequential read & aggressive random read performance
oss.RandomSeek: file length 109002
oss.RandomSeek: sequential read used 14.964
oss.RandomSeek: random read used 61.353
When I tested aggressive random read,  I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

I am thinking to continue to improve random reads.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160
 ] 

wujinhu edited comment on HADOOP-15027 at 11/13/17 6:51 AM:


[~uncleGen]
Thanks for your comments.
1. I have tested by using ossutil tool and the read speed is about 
10MB+/s(continue to verify this).
2 & 3.
I think it's ok if thread pool is in FileSystem. Thread pool is just used to 
per-fetch data from OSS. Actually, just as the following code shows

  {{*{color:#d04437}if (item.buffer.length == 0) {
//EOF
item.ready.set(true);
  } else {
this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, 
store, item));
  }
  cachedStreams.add(item);{color}*}}

each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each 
stream has its own queue).
If one input stream is slow, it just affect its own cachedStreams, and will not 
affect others.

4. I will change code style of these lines.
5. Yes, we can do a simple refactor if some modules have the same requirements.

I will add another patch to fix this.


was (Author: wujinhu):
[~uncleGen]
Thanks for your comments.
1. I have tested by using ossutil tool and the read speed is about 
10MB+/s(continue to verify this).
2 & 3.
I think it's ok if thread pool is in FileSystem. Thread pool is just used to 
per-fetch data from OSS. Actually, just as the following code shows
  *{color:#d04437}if (item.buffer.length == 0) {
//EOF
item.ready.set(true);
  } else {
this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, 
store, item));
  }
  cachedStreams.add(item);{color}*

each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each 
stream has its own queue).
If one input stream is slow, it just affect its own cachedStreams, and will not 
affect others.

4. I will change code style of these lines.
5. Yes, we can do a simple refactor if some modules have the same requirements.

I will add another patch to fix this.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160
 ] 

wujinhu edited comment on HADOOP-15027 at 11/13/17 6:52 AM:


[~uncleGen]
Thanks for your comments.
1. I have tested by using ossutil tool and the read speed is about 
10MB+/s(continue to verify this).
2 & 3.
I think it's ok if thread pool is in FileSystem. Thread pool is just used to 
per-fetch data from OSS. Actually, just as the following code shows


{code:java}
  if (item.buffer.length == 0) {
//EOF
item.ready.set(true);
  } else {
this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, 
store, item));
  }
  cachedStreams.add(item);
{code}


each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each 
stream has its own queue).
If one input stream is slow, it just affect its own cachedStreams, and will not 
affect others.

4. I will change code style of these lines.
5. Yes, we can do a simple refactor if some modules have the same requirements.

I will add another patch to fix this.


was (Author: wujinhu):
[~uncleGen]
Thanks for your comments.
1. I have tested by using ossutil tool and the read speed is about 
10MB+/s(continue to verify this).
2 & 3.
I think it's ok if thread pool is in FileSystem. Thread pool is just used to 
per-fetch data from OSS. Actually, just as the following code shows

bq.  if (item.buffer.length == 0) {
//EOF
item.ready.set(true);
  } else {
this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, 
store, item));
  }
  cachedStreams.add(item);

each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each 
stream has its own queue).
If one input stream is slow, it just affect its own cachedStreams, and will not 
affect others.

4. I will change code style of these lines.
5. Yes, we can do a simple refactor if some modules have the same requirements.

I will add another patch to fix this.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160
 ] 

wujinhu commented on HADOOP-15027:
--

[~uncleGen]
Thanks for your comments.
1. I have tested by using ossutil tool and the read speed is about 
10MB+/s(continue to verify this).
2 & 3.
I think it's ok if thread pool is in FileSystem. Thread pool is just used to 
per-fetch data from OSS. Actually, just as the following code shows
  *{color:#d04437}if (item.buffer.length == 0) {
//EOF
item.ready.set(true);
  } else {
this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, 
store, item));
  }
  cachedStreams.add(item);{color}*

each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each 
stream has its own queue).
If one input stream is slow, it just affect its own cachedStreams, and will not 
affect others.

4. I will change code style of these lines.
5. Yes, we can do a simple refactor if some modules have the same requirements.

I will add another patch to fix this.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160
 ] 

wujinhu edited comment on HADOOP-15027 at 11/13/17 6:51 AM:


[~uncleGen]
Thanks for your comments.
1. I have tested by using ossutil tool and the read speed is about 
10MB+/s(continue to verify this).
2 & 3.
I think it's ok if thread pool is in FileSystem. Thread pool is just used to 
per-fetch data from OSS. Actually, just as the following code shows

bq.  if (item.buffer.length == 0) {
//EOF
item.ready.set(true);
  } else {
this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, 
store, item));
  }
  cachedStreams.add(item);

each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each 
stream has its own queue).
If one input stream is slow, it just affect its own cachedStreams, and will not 
affect others.

4. I will change code style of these lines.
5. Yes, we can do a simple refactor if some modules have the same requirements.

I will add another patch to fix this.


was (Author: wujinhu):
[~uncleGen]
Thanks for your comments.
1. I have tested by using ossutil tool and the read speed is about 
10MB+/s(continue to verify this).
2 & 3.
I think it's ok if thread pool is in FileSystem. Thread pool is just used to 
per-fetch data from OSS. Actually, just as the following code shows

  {{*{color:#d04437}if (item.buffer.length == 0) {
//EOF
item.ready.set(true);
  } else {
this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, 
store, item));
  }
  cachedStreams.add(item);{color}*}}

each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each 
stream has its own queue).
If one input stream is slow, it just affect its own cachedStreams, and will not 
affect others.

4. I will change code style of these lines.
5. Yes, we can do a simple refactor if some modules have the same requirements.

I will add another patch to fix this.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-13 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249250#comment-16249250
 ] 

wujinhu commented on HADOOP-15027:
--

Updated to HADOOP-15027.003.patch.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-13 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.003.patch

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-10 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.001.patch

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: wujinhu
> Attachments: HADOOP-15027.001.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-21 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Description: 
IOException will be thrown in some case
Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. 
{code:java}
int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  //instream.seek(pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}
{code}


  was:
Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  //instream.seek(pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}


> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Priority: Critical
>
> IOException will be thrown in some case
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   //instream.seek(pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-21 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Description: 
IOException will be thrown in this case
1. set part size = n(102400)
2. assume current position = 0, then partRemaining = 102400
3. we call seek(pos = 101802), with pos > position && pos < position + 
partRemaining, so it will skip pos - position bytes, but partRemaining remains 
the same
4. if we read bytes more than n - pos, it will throw IOException.

Current code:
{code:java}
@Override
  public synchronized void seek(long pos) throws IOException {
checkNotClosed();
if (position == pos) {
  return;
} else if (pos > position && pos < position + partRemaining) {
  AliyunOSSUtils.skipFully(wrappedStream, pos - position);
*{color:#d04437}  // we need update partRemaining here
{color}*  position = pos;
} else {
  reopen(pos);
}
  }
{code}

Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. 
{code:java}
int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}
{code}


  was:
IOException will be thrown in some case
Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. 
{code:java}
int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  //instream.seek(pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}
{code}



> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
> *{color:#d04437}  // we need update partRemaining here
> {color}*  position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-21 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Summary: IOException will be thrown when read from Aliyun OSS  (was: 
IOException is likely to be thrown when read from Aliyun OSS)

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Priority: Critical
>
> IOException will be thrown in some case
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   //instream.seek(pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Status: Patch Available  (was: In Progress)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1, 3.0.0-alpha2
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Affects Version/s: 3.0.0-beta1

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu reassigned HADOOP-15063:


Assignee: wujinhu

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Attachment: HADOOP-15063.001.patch

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262132#comment-16262132
 ] 

wujinhu commented on HADOOP-15063:
--

Upload patch file.

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-21 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Description: 
IOException will be thrown in this case
1. set part size = n(102400)
2. assume current position = 0, then partRemaining = 102400
3. we call seek(pos = 101802), with pos > position && pos < position + 
partRemaining, so it will skip pos - position bytes, but partRemaining remains 
the same
4. if we read bytes more than n - pos, it will throw IOException.

Current code:
{code:java}
@Override
  public synchronized void seek(long pos) throws IOException {
checkNotClosed();
if (position == pos) {
  return;
} else if (pos > position && pos < position + partRemaining) {
  AliyunOSSUtils.skipFully(wrappedStream, pos - position);
  // we need update partRemaining here
  position = pos;
} else {
  reopen(pos);
}
  }
{code}

Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. 
{code:java}
int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}
{code}


  was:
IOException will be thrown in this case
1. set part size = n(102400)
2. assume current position = 0, then partRemaining = 102400
3. we call seek(pos = 101802), with pos > position && pos < position + 
partRemaining, so it will skip pos - position bytes, but partRemaining remains 
the same
4. if we read bytes more than n - pos, it will throw IOException.

Current code:
{code:java}
@Override
  public synchronized void seek(long pos) throws IOException {
checkNotClosed();
if (position == pos) {
  return;
} else if (pos > position && pos < position + partRemaining) {
  AliyunOSSUtils.skipFully(wrappedStream, pos - position);
*  // we need update partRemaining here
*  position = pos;
} else {
  reopen(pos);
}
  }
{code}

Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. 
{code:java}
int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}
{code}



> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Attachment: HADOOP-15063.001.patch

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Summary: IOException may be thrown when read from Aliyun OSS in some case  
(was: IOException will be thrown when read from Aliyun OSS)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Priority: Major  (was: Critical)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15063 started by wujinhu.

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Attachment: (was: HADOOP-15063.001.patch)

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException is likely to be thrown when read from Aliyun OSS

2017-11-21 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Summary: IOException is likely to be thrown when read from Aliyun OSS  
(was: IOException will be thrown when read from Aliyun OSS)

> IOException is likely to be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Priority: Critical
>
> IOException will be thrown in some case
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   //instream.seek(pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-21 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Description: 
IOException will be thrown in this case
1. set part size = n(102400)
2. assume current position = 0, then partRemaining = 102400
3. we call seek(pos = 101802), with pos > position && pos < position + 
partRemaining, so it will skip pos - position bytes, but partRemaining remains 
the same
4. if we read bytes more than n - pos, it will throw IOException.

Current code:
{code:java}
@Override
  public synchronized void seek(long pos) throws IOException {
checkNotClosed();
if (position == pos) {
  return;
} else if (pos > position && pos < position + partRemaining) {
  AliyunOSSUtils.skipFully(wrappedStream, pos - position);
*  // we need update partRemaining here
*  position = pos;
} else {
  reopen(pos);
}
  }
{code}

Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. 
{code:java}
int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}
{code}


  was:
IOException will be thrown in this case
1. set part size = n(102400)
2. assume current position = 0, then partRemaining = 102400
3. we call seek(pos = 101802), with pos > position && pos < position + 
partRemaining, so it will skip pos - position bytes, but partRemaining remains 
the same
4. if we read bytes more than n - pos, it will throw IOException.

Current code:
{code:java}
@Override
  public synchronized void seek(long pos) throws IOException {
checkNotClosed();
if (position == pos) {
  return;
} else if (pos > position && pos < position + partRemaining) {
  AliyunOSSUtils.skipFully(wrappedStream, pos - position);
*{color:#d04437}  // we need update partRemaining here
{color}*  position = pos;
} else {
  reopen(pos);
}
  }
{code}

Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. 
{code:java}
int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}
{code}



> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
> *  // we need update partRemaining here
> *  position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262396#comment-16262396
 ] 

wujinhu commented on HADOOP-15063:
--

Thanks for the review.
I found it is the same with https://issues.apache.org/jira/browse/HADOOP-14072
I will close this.

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2017-11-29 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Summary: AliyunOSS: Improvements for Hadoop read from AliyunOSS  (was: 
Improvements for Hadoop read from AliyunOSS)

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2017-12-14 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.004.patch

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2017-12-14 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290933#comment-16290933
 ] 

wujinhu commented on HADOOP-15027:
--

attach patch based on HADOOP-15039

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2017-12-14 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291908#comment-16291908
 ] 

wujinhu commented on HADOOP-15027:
--

[~ste...@apache.org] [~drankye] [~uncleGen] [~Sammi]
Please take a review, this patch based on HADOOP-15039 that refactor 
*_SemaphoredDelegatingExecutor_*.

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.002.patch

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-11 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.002.patch

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-12 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: (was: HADOOP-15027.002.patch)

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-13 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250921#comment-16250921
 ] 

wujinhu commented on HADOOP-15027:
--

ste...@apache.org


> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-13 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Comment: was deleted

(was: ste...@apache.org
)

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-14 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251109#comment-16251109
 ] 

wujinhu commented on HADOOP-15027:
--

[~uncleGen] 
I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS 
via hadoop command
file size: 104MB, 
When I tested aggressive random read, I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

{code:java}
public static final String MULTIPART_DOWNLOAD_SIZE_KEY = 
"fs.oss.multipart.download.size";

public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024;
{code}

Current version
ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat 
oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; 
date
Tue Nov 14 15:47:27 CST 2017
Tue Nov 14 15:48:47 CST 2017
80s

After optimized by multi-thread prefetch
---
fs.oss.multipart.download.ahead.part.max.number = 1
fs.oss.multipart.download.threads = 4
sequential IO:  77s
random IO: 74.51s

---
fs.oss.multipart.download.ahead.part.max.number = 8
fs.oss.multipart.download.threads = 4
sequential IO: 23s
random IO:  59.128

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 4
sequential IO: 19s
random IO: 83.92

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 8
sequential IO: 13s
random IO: 50.561

Summaries:
* if read ahead part number = 1, sequential IO and random IO are the same since 
there is no pre-fetch
* sequential IO will be better if read ahead part number and thread number 
increases
* worst case of random IO will be something like [sequential ... sequential, 
random, sequential ... sequential, random.] so that some pre-fetch data 
will be wasted.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-13 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250933#comment-16250933
 ] 

wujinhu commented on HADOOP-15027:
--

Yes, agree with [~uncleGen] . We could optimize random IO step by step.
As we all known, Hadoop 2.7.5 will be released soon. We hope this patch can be 
released so that we can solve the sequential IO issue(single thread read) in 
current implementation. Random IO remains the same, and we can solve this later.

I have read class _*SemaphoredDelegatingExecutor*_ and it's good enough.  I 
think [~uncleGen] you can do this job, thanks.

Besides, I will provide more detailed test results later.

 

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-13 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250965#comment-16250965
 ] 

wujinhu commented on HADOOP-15027:
--

[~stevel]

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-13 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Comment: was deleted

(was: [~stevel])

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-14 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251109#comment-16251109
 ] 

wujinhu edited comment on HADOOP-15027 at 11/14/17 9:06 AM:


[~uncleGen] [~ste...@apache.org]
I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS 
via hadoop command
file size: 104MB, 
When I tested aggressive random read, I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

{code:java}
public static final String MULTIPART_DOWNLOAD_SIZE_KEY = 
"fs.oss.multipart.download.size";

public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024;
{code}

Current version
ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat 
oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; 
date
Tue Nov 14 15:47:27 CST 2017
Tue Nov 14 15:48:47 CST 2017
80s

After optimized by multi-thread prefetch
---
fs.oss.multipart.download.ahead.part.max.number = 1
fs.oss.multipart.download.threads = 4
sequential IO:  77s
random IO: 74.51s

---
fs.oss.multipart.download.ahead.part.max.number = 8
fs.oss.multipart.download.threads = 4
sequential IO: 23s
random IO:  59.128

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 4
sequential IO: 19s
random IO: 83.92

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 8
sequential IO: 13s
random IO: 50.561

Summaries:
* if read ahead part number = 1, sequential IO and random IO are the same since 
there is no pre-fetch
* sequential IO will be better if read ahead part number and thread number 
increases
* worst case of random IO will be something like [sequential ... sequential, 
random, sequential ... sequential, random.] so that some pre-fetch data 
will be wasted.


was (Author: wujinhu):
[~uncleGen] 
I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS 
via hadoop command
file size: 104MB, 
When I tested aggressive random read, I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

{code:java}
public static final String MULTIPART_DOWNLOAD_SIZE_KEY = 
"fs.oss.multipart.download.size";

public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024;
{code}

Current version
ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat 
oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; 
date
Tue Nov 14 15:47:27 CST 2017
Tue Nov 14 15:48:47 CST 2017
80s

After optimized by multi-thread prefetch
---
fs.oss.multipart.download.ahead.part.max.number = 1
fs.oss.multipart.download.threads = 4
sequential IO:  77s
random IO: 74.51s

---
fs.oss.multipart.download.ahead.part.max.number = 8
fs.oss.multipart.download.threads = 4
sequential IO: 23s
random IO:  59.128

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 4
sequential IO: 19s
random IO: 83.92

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 8
sequential IO: 13s
random IO: 50.561

Summaries:
* if read ahead part number = 1, sequential IO and random IO are the same since 
there is no pre-fetch
* sequential IO will be better if read ahead part number and thread number 
increases
* worst case of random IO will be something like [sequential ... sequential, 
random, sequential ... sequential, random.] so that some pre-fetch data 
will be wasted.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-14 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251109#comment-16251109
 ] 

wujinhu edited comment on HADOOP-15027 at 11/14/17 4:27 PM:


[~uncleGen] [~ste...@apache.org]
I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS 
via hadoop command
file size: 104MB, 
When I tested aggressive random read, I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

{code:java}
public static final String MULTIPART_DOWNLOAD_SIZE_KEY = 
"fs.oss.multipart.download.size";

public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024;
{code}

Current version
ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat 
oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; 
date
Tue Nov 14 15:47:27 CST 2017
Tue Nov 14 15:48:47 CST 2017
80s

After optimized by multi-thread prefetch
---
fs.oss.multipart.download.ahead.part.max.number = 1
fs.oss.multipart.download.threads = 4
sequential IO:  77s
random IO: 74.51s

---
fs.oss.multipart.download.ahead.part.max.number = 8
fs.oss.multipart.download.threads = 4
sequential IO: 23s
random IO:  59.128

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 4
sequential IO: 19s
random IO: 83.92

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 8
sequential IO: 13s(continue to test this, already reach the upper limit of the 
network)
random IO: 50.561

Summaries:
* if read ahead part number = 1, sequential IO and random IO are the same since 
there is no pre-fetch
* sequential IO will be better if read ahead part number and thread number 
increases
* worst case of random IO will be something like [sequential ... sequential, 
random, sequential ... sequential, random.] so that some pre-fetch data 
will be wasted.


was (Author: wujinhu):
[~uncleGen] [~ste...@apache.org]
I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS 
via hadoop command
file size: 104MB, 
When I tested aggressive random read, I divided this file into multipart(each 
part is 1MB) and shuffle all the parts to read.

{code:java}
public static final String MULTIPART_DOWNLOAD_SIZE_KEY = 
"fs.oss.multipart.download.size";

public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024;
{code}

Current version
ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat 
oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; 
date
Tue Nov 14 15:47:27 CST 2017
Tue Nov 14 15:48:47 CST 2017
80s

After optimized by multi-thread prefetch
---
fs.oss.multipart.download.ahead.part.max.number = 1
fs.oss.multipart.download.threads = 4
sequential IO:  77s
random IO: 74.51s

---
fs.oss.multipart.download.ahead.part.max.number = 8
fs.oss.multipart.download.threads = 4
sequential IO: 23s
random IO:  59.128

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 4
sequential IO: 19s
random IO: 83.92

---
fs.oss.multipart.download.ahead.part.max.number = 16
fs.oss.multipart.download.threads = 8
sequential IO: 13s
random IO: 50.561

Summaries:
* if read ahead part number = 1, sequential IO and random IO are the same since 
there is no pre-fetch
* sequential IO will be better if read ahead part number and thread number 
increases
* worst case of random IO will be something like [sequential ... sequential, 
random, sequential ... sequential, random.] so that some pre-fetch data 
will be wasted.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-21 Thread wujinhu (JIRA)

wujinhu created HADOOP-15063:


 Summary: IOException will be thrown when read from Aliyun OSS
 Key: HADOOP-15063
 URL: https://issues.apache.org/jira/browse/HADOOP-15063
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/oss
Affects Versions: 3.0.0-alpha2
Reporter: wujinhu
Priority: Critical


Logs:
java.io.IOException: Failed to read from stream. Remaining:101802

at 
org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)

How to re-produce:
1. create a file with 10MB size
2. int seekTimes = 150;
for (int i = 0; i < seekTimes; i++) {
  long pos = size / (seekTimes - i) - 1;
  LOG.info("begin seeking for pos: " + pos);
  //instream.seek(pos);
  byte []buf = new byte[1024];
  instream.read(pos, buf, 0, 1024);
}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-10 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248362#comment-16248362
 ] 

wujinhu commented on HADOOP-15027:
--

Hi Steve Loughran 
Thanks for the comments and your suggestions are very helpful. I will follow 
your suggestions about thread pool and retry logic.
For random IO, it is true that my implementation will not work well.
It seems HADOOP-14535 is similar with what os does.
Operation system starts to sequential read-ahead when one of the following 
conditions satisfies:
* first read from a file and seek pos is 0
* current read and previous read are continuous in this file
Otherwise, it is random IO.
I will take a look at these two issues and continue to improve this.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-10 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248362#comment-16248362
 ] 

wujinhu edited comment on HADOOP-15027 at 11/11/17 6:40 AM:


Hi Steve Loughran 
Thanks for the comments and your suggestions are very helpful. I will follow 
your suggestions about thread pool and retry logic.
For random IO, it is true that my implementation will not work well.
It seems HADOOP-14535 is similar with what os does.
Operation system starts to sequential read-ahead when one of the following 
conditions satisfies:
* first read from a file and seek pos is 0
* current read and previous read are continuous in this file

Otherwise, it is random IO.
I will take a look at these two issues and continue to improve this.


was (Author: wujinhu):
Hi Steve Loughran 
Thanks for the comments and your suggestions are very helpful. I will follow 
your suggestions about thread pool and retry logic.
For random IO, it is true that my implementation will not work well.
It seems HADOOP-14535 is similar with what os does.
Operation system starts to sequential read-ahead when one of the following 
conditions satisfies:
* first read from a file and seek pos is 0
* current read and previous read are continuous in this file
Otherwise, it is random IO.
I will take a look at these two issues and continue to improve this.

> Improvements for Hadoop read from AliyunOSS
> ---
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch
>
>
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It 
> needs about 1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  
> so we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS

2017-11-09 Thread wujinhu (JIRA)

wujinhu created HADOOP-15027:


 Summary: Improvements for Hadoop read from AliyunOSS
 Key: HADOOP-15027
 URL: https://issues.apache.org/jira/browse/HADOOP-15027
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/oss
Reporter: wujinhu


Currently, read performance is poor when Hadoop reads from AliyunOSS. It needs 
about 1min to read 1GB from OSS.
Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  so 
we can refactor this by using multi-thread pre read to improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Component/s: fs/oss

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Fix Version/s: 3.0.0

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Affects Version/s: 3.0.0-beta1

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Fix Version/s: (was: 3.1.0)

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Fix Version/s: 2.9.1
   3.1.0

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Description: 
Currently, default number of times we should retry errors is 20,  however, oss 
sdk retry delay is   

{code:java}
long delay = (long)Math.pow(2, retries) * 0.3
{code}
 when one error occurs. So, if we retry 20 times, sleep time will be about 3.64 
days and it is unacceptable. So we should change the default behavior.






> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: wujinhu
>Assignee: wujinhu
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Attachment: HADOOP-15104.001.patch

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
> Attachments: HADOOP-15104.001.patch
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Attachment: (was: HADOOP-15104.001.patch)

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
> Attachments: HADOOP-15104.001.patch
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283659#comment-16283659
 ] 

wujinhu commented on HADOOP-15104:
--

Attach for trunk

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
> Attachments: HADOOP-15104.001.patch
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)

wujinhu created HADOOP-15104:


 Summary: AliyunOSS: change default max error retry
 Key: HADOOP-15104
 URL: https://issues.apache.org/jira/browse/HADOOP-15104
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: wujinhu
Assignee: wujinhu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15104:
-
Attachment: HADOOP-15104.001.patch

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
> Attachments: HADOOP-15104.001.patch
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-15104) AliyunOSS: change default max error retry

2017-12-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15104 started by wujinhu.

> AliyunOSS: change default max error retry
> -
>
> Key: HADOOP-15104
> URL: https://issues.apache.org/jira/browse/HADOOP-15104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.0, 2.9.1
>
> Attachments: HADOOP-15104.001.patch
>
>
> Currently, default number of times we should retry errors is 20,  however, 
> oss sdk retry delay is   
> {code:java}
> long delay = (long)Math.pow(2, retries) * 0.3
> {code}
>  when one error occurs. So, if we retry 20 times, sleep time will be about 
> 3.64 days and it is unacceptable. So we should change the default behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2017-12-29 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.006.patch

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential

2018-01-04 Thread wujinhu (JIRA)

wujinhu created HADOOP-15158:


 Summary: AliyunOSS: AliyunCredentialsProvider supports role based 
credential 
 Key: HADOOP-15158
 URL: https://issues.apache.org/jira/browse/HADOOP-15158
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/oss
Affects Versions: 3.0.0
Reporter: wujinhu
 Fix For: 2.9.1, 3.0.1


Currently, AliyunCredentialsProvider supports credential by 
configuration(core-site.xml). Sometimes, admin wants to create different 
temporary credential(key/secret/token) for different roles so that one role 
cannot read data that belongs to another role.
So, our code should support pass in the URI when creates an 
XXXCredentialsProvider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.011.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-08 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317880#comment-16317880
 ] 

wujinhu commented on HADOOP-15027:
--

Attach patch to fix code style and FindBugs warning.

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15158 started by wujinhu.

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15158.001.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: (was: HADOOP-15027.011.patch)

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.011.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: (was: HADOOP-15027.011.patch)

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.010.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-11 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869
 ] 

wujinhu edited comment on HADOOP-15027 at 1/11/18 8:44 AM:
---

Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   patch  current
query13.sql 241.591440.524
query28.sql 1259.307   1943.949
query51.sql 469.618722.904
query73.sql 216.596414.75
query96.sql 268.869476.473
{code}



was (Author: wujinhu):
Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   after  before
query13.sql 241.591440.524
query28.sql 1259.307   1943.949
query51.sql 469.618722.904
query73.sql 216.596414.75
query96.sql 268.869476.473
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-11 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869
 ] 

wujinhu edited comment on HADOOP-15027 at 1/11/18 8:43 AM:
---

Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   after   before
query13.sql 241.591   440.524
query28.sql 1259.307 1943.949
query51.sql 469.618722.904
query73.sql 216.596414.75
query96.sql 268.869476.473
{code}



was (Author: wujinhu):
Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   after   before
query13.sql 241.591   440.524
query28.sql 1259.307 1943.949
query51.sql 469.618 722.904
query73.sql 216.596 414.75
query96.sql 268.869 476.473
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-11 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869
 ] 

wujinhu commented on HADOOP-15027:
--

Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query after before
query13.sql   241.591   440.524
query28.sql   1259.307 1943.949
query51.sql 469.618 722.904
query73.sql 216.596 414.75
query96.sql 268.869 476.473
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-11 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869
 ] 

wujinhu edited comment on HADOOP-15027 at 1/11/18 8:43 AM:
---

Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   after   before
query13.sql 241.591   440.524
query28.sql 1259.307 1943.949
query51.sql 469.618 722.904
query73.sql 216.596 414.75
query96.sql 268.869 476.473
{code}



was (Author: wujinhu):
Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query after before
query13.sql   241.591   440.524
query28.sql   1259.307 1943.949
query51.sql 469.618 722.904
query73.sql 216.596 414.75
query96.sql 268.869 476.473
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-11 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869
 ] 

wujinhu edited comment on HADOOP-15027 at 1/11/18 8:43 AM:
---

Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   after  before
query13.sql 241.591440.524
query28.sql 1259.307   1943.949
query51.sql 469.618722.904
query73.sql 216.596414.75
query96.sql 268.869476.473
{code}



was (Author: wujinhu):
Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   after   before
query13.sql 241.591   440.524
query28.sql 1259.307 1943.949
query51.sql 469.618722.904
query73.sql 216.596414.75
query96.sql 268.869476.473
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-11 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869
 ] 

wujinhu edited comment on HADOOP-15027 at 1/11/18 3:17 PM:
---

Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version(text file).
{code:java}
query   patch  current
query13.sql 241.591440.524
query28.sql 1259.307   1943.949
query51.sql 469.618722.904
query73.sql 216.596414.75
query96.sql 268.869476.473
{code}



was (Author: wujinhu):
Hi [~Sammi], here are some performance data. I use this 
tool(https://github.com/hortonworks/hive-testbench) to compare run time between 
this patch and current version.
{code:java}
query   patch  current
query13.sql 241.591440.524
query28.sql 1259.307   1943.949
query51.sql 469.618722.904
query73.sql 216.596414.75
query96.sql 268.869476.473
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2018-01-03 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: (was: HADOOP-15027.007.patch)

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2018-01-03 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Comment: was deleted

(was: update default configure

{code:java}
--- 
a/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java
+++ 
b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java
@@ -101,14 +101,14 @@ private Constants() {

   public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY =
   "fs.oss.multipart.download.threads";
-  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10;
+  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16;

   public static final String MAX_TOTAL_TASKS = "fs.oss.max.total.tasks";
-  public static final int DEFAULT_MAX_TOTAL_TASKS = 128;
+  public static final int DEFAULT_MAX_TOTAL_TASKS = 1024;

   public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY =
   "fs.oss.multipart.download.ahead.part.max.number";
-  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8;
+  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4;
{code}
)

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2018-01-03 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309539#comment-16309539
 ] 

wujinhu commented on HADOOP-15027:
--

update default configure

{code:java}
--- 
a/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java
+++ 
b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java
@@ -101,14 +101,14 @@ private Constants() {

   public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY =
   "fs.oss.multipart.download.threads";
-  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10;
+  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16;

   public static final String MAX_TOTAL_TASKS = "fs.oss.max.total.tasks";
-  public static final int DEFAULT_MAX_TOTAL_TASKS = 128;
+  public static final int DEFAULT_MAX_TOTAL_TASKS = 1024;

   public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY =
   "fs.oss.multipart.download.ahead.part.max.number";
-  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8;
+  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4;
{code}


> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS

2018-01-03 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.007.patch

> AliyunOSS: Improvements for Hadoop read from AliyunOSS
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential

2018-01-04 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Description: 
Currently, AliyunCredentialsProvider supports credential by 
configuration(core-site.xml). Sometimes, admin wants to create different 
temporary credential(key/secret/token) for different roles so that one role 
cannot read data that belongs to another role.
So, our code should support pass in the URI when creates an 
XXXCredentialsProvider so that we can get user info(role) from the URI

  was:
Currently, AliyunCredentialsProvider supports credential by 
configuration(core-site.xml). Sometimes, admin wants to create different 
temporary credential(key/secret/token) for different roles so that one role 
cannot read data that belongs to another role.
So, our code should support pass in the URI when creates an 
XXXCredentialsProvider so that we can get user info from the URI


> AliyunOSS: AliyunCredentialsProvider supports role based credential 
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
> Fix For: 2.9.1, 3.0.1
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential

2018-01-04 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312621#comment-16312621
 ] 

wujinhu commented on HADOOP-15158:
--

Attach patch.
However, should we add an implementation for this?

> AliyunOSS: AliyunCredentialsProvider supports role based credential 
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 2.9.1, 3.0.1
>
> Attachments: HADOOP-15158.001.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-04 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Summary: AliyunOSS: Supports role based credential  (was: AliyunOSS: 
AliyunCredentialsProvider supports role based credential )

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 2.9.1, 3.0.1
>
> Attachments: HADOOP-15158.001.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential

2018-01-04 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Description: 
Currently, AliyunCredentialsProvider supports credential by 
configuration(core-site.xml). Sometimes, admin wants to create different 
temporary credential(key/secret/token) for different roles so that one role 
cannot read data that belongs to another role.
So, our code should support pass in the URI when creates an 
XXXCredentialsProvider so that we can get user info from the URI

  was:
Currently, AliyunCredentialsProvider supports credential by 
configuration(core-site.xml). Sometimes, admin wants to create different 
temporary credential(key/secret/token) for different roles so that one role 
cannot read data that belongs to another role.
So, our code should support pass in the URI when creates an 
XXXCredentialsProvider.


> AliyunOSS: AliyunCredentialsProvider supports role based credential 
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
> Fix For: 2.9.1, 3.0.1
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential

2018-01-04 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu reassigned HADOOP-15158:


Assignee: wujinhu

> AliyunOSS: AliyunCredentialsProvider supports role based credential 
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 2.9.1, 3.0.1
>
> Attachments: HADOOP-15158.001.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential

2018-01-04 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Attachment: HADOOP-15158.001.patch

> AliyunOSS: AliyunCredentialsProvider supports role based credential 
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
> Fix For: 2.9.1, 3.0.1
>
> Attachments: HADOOP-15158.001.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-04 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311600#comment-16311600
 ] 

wujinhu commented on HADOOP-15027:
--

Updated the patch! Thanks [~Sammi] for the review. Here are my updates for your 
comments:
1 & 2 fixed
3. store.close() will not throw any exception, OSSClient.shutdown() catches 
Exception.
4. Yes, fs.oss.max.total.tasks is the max queue length used for read ahead.
5. fsDataInputStream.seek() is not missed, you can see it in 007.patch
6. I have added another test case for AliyunOSSFileReaderTask and will continue 
to add more test cases.

Please help to review again. Thanks

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-04 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.009.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-05 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313216#comment-16313216
 ] 

wujinhu commented on HADOOP-15027:
--

Hi [~Sammi]
Thanks for your review. I have attached some performance data, you can view the 
comments above.

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-10 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320103#comment-16320103
 ] 

wujinhu edited comment on HADOOP-15027 at 1/10/18 11:28 AM:


Change some default configuration.

{code:java}
-  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024;
+  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024;

   public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY =
   "fs.oss.multipart.download.threads";
-  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16;
+  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10;

   public static final String MAX_TOTAL_TASKS_KEY = "fs.oss.max.total.tasks";
   public static final int MAX_TOTAL_TASKS_DEFAULT = 128;

   public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY =
   "fs.oss.multipart.download.ahead.part.max.number";
-  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8;
+  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4;
{code}



was (Author: wujinhu):
Change some default configuration.

{code:java}
474c474
< index dd71842fb87..dedc038f3f7 100644
---
> index dd71842fb87..a1070277d33 100644
482c482
< +  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024;
---
> +  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024;
486c486
< +  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10;
---
> +  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16;
493c493
< +  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4;
---
> +  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8;
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-10 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320103#comment-16320103
 ] 

wujinhu edited comment on HADOOP-15027 at 1/10/18 11:32 AM:


Change some default configurations.

{code:java}
-  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024;
+  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024;

   public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY =
   "fs.oss.multipart.download.threads";
-  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16;
+  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10;

   public static final String MAX_TOTAL_TASKS_KEY = "fs.oss.max.total.tasks";
   public static final int MAX_TOTAL_TASKS_DEFAULT = 128;

   public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY =
   "fs.oss.multipart.download.ahead.part.max.number";
-  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8;
+  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4;
{code}



was (Author: wujinhu):
Change some default configuration.

{code:java}
-  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024;
+  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024;

   public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY =
   "fs.oss.multipart.download.threads";
-  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16;
+  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10;

   public static final String MAX_TOTAL_TASKS_KEY = "fs.oss.max.total.tasks";
   public static final int MAX_TOTAL_TASKS_DEFAULT = 128;

   public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY =
   "fs.oss.multipart.download.ahead.part.max.number";
-  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8;
+  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4;
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-10 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.012.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-10 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320103#comment-16320103
 ] 

wujinhu commented on HADOOP-15027:
--

Change some default configuration.

{code:java}
474c474
< index dd71842fb87..dedc038f3f7 100644
---
> index dd71842fb87..a1070277d33 100644
482c482
< +  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024;
---
> +  public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024;
486c486
< +  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10;
---
> +  public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16;
493c493
< +  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4;
---
> +  public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8;
{code}


> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-08 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.011.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-17 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Attachment: HADOOP-15158.004.patch

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.9.1, 3.0.1
>
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, 
> HADOOP-15158.003.patch, HADOOP-15158.004.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-15 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326830#comment-16326830
 ] 

wujinhu commented on HADOOP-15027:
--

Thanks [~Sammi] for the review. I have updated the patch.:)

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-15 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.013.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-16 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.014.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-09 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319790#comment-16319790
 ] 

wujinhu commented on HADOOP-15158:
--

Attach patch! I add a test to cover those changed code lines.

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.1
>
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, 
> HADOOP-15158.003.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-09 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Fix Version/s: 3.0.1
   Status: Patch Available  (was: In Progress)

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.1
>
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-09 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Attachment: HADOOP-15158.003.patch

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.1
>
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, 
> HADOOP-15158.003.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-09 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Fix Version/s: 2.9.1

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 2.9.1, 3.0.1
>
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, 
> HADOOP-15158.003.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential

2018-01-09 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15158:
-
Attachment: HADOOP-15158.002.patch

> AliyunOSS: Supports role based credential
> -
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
> Fix For: 3.0.1
>
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

1 2 3 4 5 >

1 - 100 of 404 matches

Mail list logo