[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249072#comment-16249072 ] wujinhu edited comment on HADOOP-15027 at 11/13/17 3:29 AM: Updates(HADOOP-15027.002.patch): 1. I have moved thread pool from InputStream to FileSystem. 2. disable pre-fetch in random IO. Currently, I have tested sequential read & aggressive random read performance oss.RandomSeek: file length 109002 oss.RandomSeek: sequential read used 14.964 oss.RandomSeek: random read used 61.353 When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. I am thinking to continue to improve random reads. was (Author: wujinhu): Updates: 1. I have moved thread pool from InputStream to FileSystem. 2. disable pre-fetch in random IO. Currently, I have tested sequential read & aggressive random read performance oss.RandomSeek: file length 109002 oss.RandomSeek: sequential read used 14.964 oss.RandomSeek: random read used 61.353 When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. I am thinking to continue to improve random reads. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249072#comment-16249072 ] wujinhu commented on HADOOP-15027: -- Updates: 1. I have moved thread pool from InputStream to FileSystem. 2. disable pre-fetch in random IO. Currently, I have tested sequential read & aggressive random read performance oss.RandomSeek: file length 109002 oss.RandomSeek: sequential read used 14.964 oss.RandomSeek: random read used 61.353 When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. I am thinking to continue to improve random reads. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160 ] wujinhu edited comment on HADOOP-15027 at 11/13/17 6:51 AM: [~uncleGen] Thanks for your comments. 1. I have tested by using ossutil tool and the read speed is about 10MB+/s(continue to verify this). 2 & 3. I think it's ok if thread pool is in FileSystem. Thread pool is just used to per-fetch data from OSS. Actually, just as the following code shows {{*{color:#d04437}if (item.buffer.length == 0) { //EOF item.ready.set(true); } else { this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, store, item)); } cachedStreams.add(item);{color}*}} each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each stream has its own queue). If one input stream is slow, it just affect its own cachedStreams, and will not affect others. 4. I will change code style of these lines. 5. Yes, we can do a simple refactor if some modules have the same requirements. I will add another patch to fix this. was (Author: wujinhu): [~uncleGen] Thanks for your comments. 1. I have tested by using ossutil tool and the read speed is about 10MB+/s(continue to verify this). 2 & 3. I think it's ok if thread pool is in FileSystem. Thread pool is just used to per-fetch data from OSS. Actually, just as the following code shows *{color:#d04437}if (item.buffer.length == 0) { //EOF item.ready.set(true); } else { this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, store, item)); } cachedStreams.add(item);{color}* each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each stream has its own queue). If one input stream is slow, it just affect its own cachedStreams, and will not affect others. 4. I will change code style of these lines. 5. Yes, we can do a simple refactor if some modules have the same requirements. I will add another patch to fix this. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160 ] wujinhu edited comment on HADOOP-15027 at 11/13/17 6:52 AM: [~uncleGen] Thanks for your comments. 1. I have tested by using ossutil tool and the read speed is about 10MB+/s(continue to verify this). 2 & 3. I think it's ok if thread pool is in FileSystem. Thread pool is just used to per-fetch data from OSS. Actually, just as the following code shows {code:java} if (item.buffer.length == 0) { //EOF item.ready.set(true); } else { this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, store, item)); } cachedStreams.add(item); {code} each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each stream has its own queue). If one input stream is slow, it just affect its own cachedStreams, and will not affect others. 4. I will change code style of these lines. 5. Yes, we can do a simple refactor if some modules have the same requirements. I will add another patch to fix this. was (Author: wujinhu): [~uncleGen] Thanks for your comments. 1. I have tested by using ossutil tool and the read speed is about 10MB+/s(continue to verify this). 2 & 3. I think it's ok if thread pool is in FileSystem. Thread pool is just used to per-fetch data from OSS. Actually, just as the following code shows bq. if (item.buffer.length == 0) { //EOF item.ready.set(true); } else { this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, store, item)); } cachedStreams.add(item); each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each stream has its own queue). If one input stream is slow, it just affect its own cachedStreams, and will not affect others. 4. I will change code style of these lines. 5. Yes, we can do a simple refactor if some modules have the same requirements. I will add another patch to fix this. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160 ] wujinhu commented on HADOOP-15027: -- [~uncleGen] Thanks for your comments. 1. I have tested by using ossutil tool and the read speed is about 10MB+/s(continue to verify this). 2 & 3. I think it's ok if thread pool is in FileSystem. Thread pool is just used to per-fetch data from OSS. Actually, just as the following code shows *{color:#d04437}if (item.buffer.length == 0) { //EOF item.ready.set(true); } else { this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, store, item)); } cachedStreams.add(item);{color}* each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each stream has its own queue). If one input stream is slow, it just affect its own cachedStreams, and will not affect others. 4. I will change code style of these lines. 5. Yes, we can do a simple refactor if some modules have the same requirements. I will add another patch to fix this. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249160#comment-16249160 ] wujinhu edited comment on HADOOP-15027 at 11/13/17 6:51 AM: [~uncleGen] Thanks for your comments. 1. I have tested by using ossutil tool and the read speed is about 10MB+/s(continue to verify this). 2 & 3. I think it's ok if thread pool is in FileSystem. Thread pool is just used to per-fetch data from OSS. Actually, just as the following code shows bq. if (item.buffer.length == 0) { //EOF item.ready.set(true); } else { this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, store, item)); } cachedStreams.add(item); each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each stream has its own queue). If one input stream is slow, it just affect its own cachedStreams, and will not affect others. 4. I will change code style of these lines. 5. Yes, we can do a simple refactor if some modules have the same requirements. I will add another patch to fix this. was (Author: wujinhu): [~uncleGen] Thanks for your comments. 1. I have tested by using ossutil tool and the read speed is about 10MB+/s(continue to verify this). 2 & 3. I think it's ok if thread pool is in FileSystem. Thread pool is just used to per-fetch data from OSS. Actually, just as the following code shows {{*{color:#d04437}if (item.buffer.length == 0) { //EOF item.ready.set(true); } else { this.readAheadExecutorService.execute(new AliyunOSSFileReaderTask(key, store, item)); } cachedStreams.add(item);{color}*}} each item will be enqueue both thread pool(FileSystem) and cachedStreams(Each stream has its own queue). If one input stream is slow, it just affect its own cachedStreams, and will not affect others. 4. I will change code style of these lines. 5. Yes, we can do a simple refactor if some modules have the same requirements. I will add another patch to fix this. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249250#comment-16249250 ] wujinhu commented on HADOOP-15027: -- Updated to HADOOP-15027.003.patch. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.003.patch > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.001.patch > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Reporter: wujinhu > Attachments: HADOOP-15027.001.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Description: IOException will be thrown in some case Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. {code:java} int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); //instream.seek(pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } {code} was: Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); //instream.seek(pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Priority: Critical > > IOException will be thrown in some case > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > //instream.seek(pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Description: IOException will be thrown in this case 1. set part size = n(102400) 2. assume current position = 0, then partRemaining = 102400 3. we call seek(pos = 101802), with pos > position && pos < position + partRemaining, so it will skip pos - position bytes, but partRemaining remains the same 4. if we read bytes more than n - pos, it will throw IOException. Current code: {code:java} @Override public synchronized void seek(long pos) throws IOException { checkNotClosed(); if (position == pos) { return; } else if (pos > position && pos < position + partRemaining) { AliyunOSSUtils.skipFully(wrappedStream, pos - position); *{color:#d04437} // we need update partRemaining here {color}* position = pos; } else { reopen(pos); } } {code} Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. {code:java} int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } {code} was: IOException will be thrown in some case Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. {code:java} int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); //instream.seek(pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } {code} > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > *{color:#d04437} // we need update partRemaining here > {color}* position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Summary: IOException will be thrown when read from Aliyun OSS (was: IOException is likely to be thrown when read from Aliyun OSS) > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Priority: Critical > > IOException will be thrown in some case > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > //instream.seek(pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Status: Patch Available (was: In Progress) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-beta1, 3.0.0-alpha2 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Affects Version/s: 3.0.0-beta1 > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu reassigned HADOOP-15063: Assignee: wujinhu > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Attachment: HADOOP-15063.001.patch > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262132#comment-16262132 ] wujinhu commented on HADOOP-15063: -- Upload patch file. > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Description: IOException will be thrown in this case 1. set part size = n(102400) 2. assume current position = 0, then partRemaining = 102400 3. we call seek(pos = 101802), with pos > position && pos < position + partRemaining, so it will skip pos - position bytes, but partRemaining remains the same 4. if we read bytes more than n - pos, it will throw IOException. Current code: {code:java} @Override public synchronized void seek(long pos) throws IOException { checkNotClosed(); if (position == pos) { return; } else if (pos > position && pos < position + partRemaining) { AliyunOSSUtils.skipFully(wrappedStream, pos - position); // we need update partRemaining here position = pos; } else { reopen(pos); } } {code} Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. {code:java} int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } {code} was: IOException will be thrown in this case 1. set part size = n(102400) 2. assume current position = 0, then partRemaining = 102400 3. we call seek(pos = 101802), with pos > position && pos < position + partRemaining, so it will skip pos - position bytes, but partRemaining remains the same 4. if we read bytes more than n - pos, it will throw IOException. Current code: {code:java} @Override public synchronized void seek(long pos) throws IOException { checkNotClosed(); if (position == pos) { return; } else if (pos > position && pos < position + partRemaining) { AliyunOSSUtils.skipFully(wrappedStream, pos - position); * // we need update partRemaining here * position = pos; } else { reopen(pos); } } {code} Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. {code:java} int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } {code} > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Attachment: HADOOP-15063.001.patch > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Summary: IOException may be thrown when read from Aliyun OSS in some case (was: IOException will be thrown when read from Aliyun OSS) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Priority: Major (was: Critical) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-15063 started by wujinhu. > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Attachment: (was: HADOOP-15063.001.patch) > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException is likely to be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Summary: IOException is likely to be thrown when read from Aliyun OSS (was: IOException will be thrown when read from Aliyun OSS) > IOException is likely to be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Priority: Critical > > IOException will be thrown in some case > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > //instream.seek(pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Description: IOException will be thrown in this case 1. set part size = n(102400) 2. assume current position = 0, then partRemaining = 102400 3. we call seek(pos = 101802), with pos > position && pos < position + partRemaining, so it will skip pos - position bytes, but partRemaining remains the same 4. if we read bytes more than n - pos, it will throw IOException. Current code: {code:java} @Override public synchronized void seek(long pos) throws IOException { checkNotClosed(); if (position == pos) { return; } else if (pos > position && pos < position + partRemaining) { AliyunOSSUtils.skipFully(wrappedStream, pos - position); * // we need update partRemaining here * position = pos; } else { reopen(pos); } } {code} Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. {code:java} int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } {code} was: IOException will be thrown in this case 1. set part size = n(102400) 2. assume current position = 0, then partRemaining = 102400 3. we call seek(pos = 101802), with pos > position && pos < position + partRemaining, so it will skip pos - position bytes, but partRemaining remains the same 4. if we read bytes more than n - pos, it will throw IOException. Current code: {code:java} @Override public synchronized void seek(long pos) throws IOException { checkNotClosed(); if (position == pos) { return; } else if (pos > position && pos < position + partRemaining) { AliyunOSSUtils.skipFully(wrappedStream, pos - position); *{color:#d04437} // we need update partRemaining here {color}* position = pos; } else { reopen(pos); } } {code} Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. {code:java} int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } {code} > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > * // we need update partRemaining here > * position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA
[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262396#comment-16262396 ] wujinhu commented on HADOOP-15063: -- Thanks for the review. I found it is the same with https://issues.apache.org/jira/browse/HADOOP-14072 I will close this. > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Resolution: Duplicate Status: Resolved (was: Patch Available) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Summary: AliyunOSS: Improvements for Hadoop read from AliyunOSS (was: Improvements for Hadoop read from AliyunOSS) > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.004.patch > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290933#comment-16290933 ] wujinhu commented on HADOOP-15027: -- attach patch based on HADOOP-15039 > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291908#comment-16291908 ] wujinhu commented on HADOOP-15027: -- [~ste...@apache.org] [~drankye] [~uncleGen] [~Sammi] Please take a review, this patch based on HADOOP-15039 that refactor *_SemaphoredDelegatingExecutor_*. > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.002.patch > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.002.patch > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: (was: HADOOP-15027.002.patch) > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250921#comment-16250921 ] wujinhu commented on HADOOP-15027: -- ste...@apache.org > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Comment: was deleted (was: ste...@apache.org ) > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251109#comment-16251109 ] wujinhu commented on HADOOP-15027: -- [~uncleGen] I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS via hadoop command file size: 104MB, When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. {code:java} public static final String MULTIPART_DOWNLOAD_SIZE_KEY = "fs.oss.multipart.download.size"; public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024; {code} Current version ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; date Tue Nov 14 15:47:27 CST 2017 Tue Nov 14 15:48:47 CST 2017 80s After optimized by multi-thread prefetch --- fs.oss.multipart.download.ahead.part.max.number = 1 fs.oss.multipart.download.threads = 4 sequential IO: 77s random IO: 74.51s --- fs.oss.multipart.download.ahead.part.max.number = 8 fs.oss.multipart.download.threads = 4 sequential IO: 23s random IO: 59.128 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 4 sequential IO: 19s random IO: 83.92 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 8 sequential IO: 13s random IO: 50.561 Summaries: * if read ahead part number = 1, sequential IO and random IO are the same since there is no pre-fetch * sequential IO will be better if read ahead part number and thread number increases * worst case of random IO will be something like [sequential ... sequential, random, sequential ... sequential, random.] so that some pre-fetch data will be wasted. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250933#comment-16250933 ] wujinhu commented on HADOOP-15027: -- Yes, agree with [~uncleGen] . We could optimize random IO step by step. As we all known, Hadoop 2.7.5 will be released soon. We hope this patch can be released so that we can solve the sequential IO issue(single thread read) in current implementation. Random IO remains the same, and we can solve this later. I have read class _*SemaphoredDelegatingExecutor*_ and it's good enough. I think [~uncleGen] you can do this job, thanks. Besides, I will provide more detailed test results later. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250965#comment-16250965 ] wujinhu commented on HADOOP-15027: -- [~stevel] > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Comment: was deleted (was: [~stevel]) > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251109#comment-16251109 ] wujinhu edited comment on HADOOP-15027 at 11/14/17 9:06 AM: [~uncleGen] [~ste...@apache.org] I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS via hadoop command file size: 104MB, When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. {code:java} public static final String MULTIPART_DOWNLOAD_SIZE_KEY = "fs.oss.multipart.download.size"; public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024; {code} Current version ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; date Tue Nov 14 15:47:27 CST 2017 Tue Nov 14 15:48:47 CST 2017 80s After optimized by multi-thread prefetch --- fs.oss.multipart.download.ahead.part.max.number = 1 fs.oss.multipart.download.threads = 4 sequential IO: 77s random IO: 74.51s --- fs.oss.multipart.download.ahead.part.max.number = 8 fs.oss.multipart.download.threads = 4 sequential IO: 23s random IO: 59.128 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 4 sequential IO: 19s random IO: 83.92 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 8 sequential IO: 13s random IO: 50.561 Summaries: * if read ahead part number = 1, sequential IO and random IO are the same since there is no pre-fetch * sequential IO will be better if read ahead part number and thread number increases * worst case of random IO will be something like [sequential ... sequential, random, sequential ... sequential, random.] so that some pre-fetch data will be wasted. was (Author: wujinhu): [~uncleGen] I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS via hadoop command file size: 104MB, When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. {code:java} public static final String MULTIPART_DOWNLOAD_SIZE_KEY = "fs.oss.multipart.download.size"; public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024; {code} Current version ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; date Tue Nov 14 15:47:27 CST 2017 Tue Nov 14 15:48:47 CST 2017 80s After optimized by multi-thread prefetch --- fs.oss.multipart.download.ahead.part.max.number = 1 fs.oss.multipart.download.threads = 4 sequential IO: 77s random IO: 74.51s --- fs.oss.multipart.download.ahead.part.max.number = 8 fs.oss.multipart.download.threads = 4 sequential IO: 23s random IO: 59.128 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 4 sequential IO: 19s random IO: 83.92 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 8 sequential IO: 13s random IO: 50.561 Summaries: * if read ahead part number = 1, sequential IO and random IO are the same since there is no pre-fetch * sequential IO will be better if read ahead part number and thread number increases * worst case of random IO will be something like [sequential ... sequential, random, sequential ... sequential, random.] so that some pre-fetch data will be wasted. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251109#comment-16251109 ] wujinhu edited comment on HADOOP-15027 at 11/14/17 4:27 PM: [~uncleGen] [~ste...@apache.org] I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS via hadoop command file size: 104MB, When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. {code:java} public static final String MULTIPART_DOWNLOAD_SIZE_KEY = "fs.oss.multipart.download.size"; public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024; {code} Current version ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; date Tue Nov 14 15:47:27 CST 2017 Tue Nov 14 15:48:47 CST 2017 80s After optimized by multi-thread prefetch --- fs.oss.multipart.download.ahead.part.max.number = 1 fs.oss.multipart.download.threads = 4 sequential IO: 77s random IO: 74.51s --- fs.oss.multipart.download.ahead.part.max.number = 8 fs.oss.multipart.download.threads = 4 sequential IO: 23s random IO: 59.128 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 4 sequential IO: 19s random IO: 83.92 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 8 sequential IO: 13s(continue to test this, already reach the upper limit of the network) random IO: 50.561 Summaries: * if read ahead part number = 1, sequential IO and random IO are the same since there is no pre-fetch * sequential IO will be better if read ahead part number and thread number increases * worst case of random IO will be something like [sequential ... sequential, random, sequential ... sequential, random.] so that some pre-fetch data will be wasted. was (Author: wujinhu): [~uncleGen] [~ste...@apache.org] I have set up hadoop env in Tencent cloud server, and read data from Aliyun OSS via hadoop command file size: 104MB, When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. {code:java} public static final String MULTIPART_DOWNLOAD_SIZE_KEY = "fs.oss.multipart.download.size"; public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1 * 1024 * 1024; {code} Current version ubuntu@VM-0-11-ubuntu:~/hadoop-3.0.0-beta1$ date; hadoop fs -cat oss://hadoop-on-oss/hadoop/wordcount/wordcount_bigdata_100M.txt > /dev/null; date Tue Nov 14 15:47:27 CST 2017 Tue Nov 14 15:48:47 CST 2017 80s After optimized by multi-thread prefetch --- fs.oss.multipart.download.ahead.part.max.number = 1 fs.oss.multipart.download.threads = 4 sequential IO: 77s random IO: 74.51s --- fs.oss.multipart.download.ahead.part.max.number = 8 fs.oss.multipart.download.threads = 4 sequential IO: 23s random IO: 59.128 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 4 sequential IO: 19s random IO: 83.92 --- fs.oss.multipart.download.ahead.part.max.number = 16 fs.oss.multipart.download.threads = 8 sequential IO: 13s random IO: 50.561 Summaries: * if read ahead part number = 1, sequential IO and random IO are the same since there is no pre-fetch * sequential IO will be better if read ahead part number and thread number increases * worst case of random IO will be something like [sequential ... sequential, random, sequential ... sequential, random.] so that some pre-fetch data will be wasted. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
wujinhu created HADOOP-15063: Summary: IOException will be thrown when read from Aliyun OSS Key: HADOOP-15063 URL: https://issues.apache.org/jira/browse/HADOOP-15063 Project: Hadoop Common Issue Type: Bug Components: fs/oss Affects Versions: 3.0.0-alpha2 Reporter: wujinhu Priority: Critical Logs: java.io.IOException: Failed to read from stream. Remaining:101802 at org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) How to re-produce: 1. create a file with 10MB size 2. int seekTimes = 150; for (int i = 0; i < seekTimes; i++) { long pos = size / (seekTimes - i) - 1; LOG.info("begin seeking for pos: " + pos); //instream.seek(pos); byte []buf = new byte[1024]; instream.read(pos, buf, 0, 1024); } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248362#comment-16248362 ] wujinhu commented on HADOOP-15027: -- Hi Steve Loughran Thanks for the comments and your suggestions are very helpful. I will follow your suggestions about thread pool and retry logic. For random IO, it is true that my implementation will not work well. It seems HADOOP-14535 is similar with what os does. Operation system starts to sequential read-ahead when one of the following conditions satisfies: * first read from a file and seek pos is 0 * current read and previous read are continuous in this file Otherwise, it is random IO. I will take a look at these two issues and continue to improve this. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248362#comment-16248362 ] wujinhu edited comment on HADOOP-15027 at 11/11/17 6:40 AM: Hi Steve Loughran Thanks for the comments and your suggestions are very helpful. I will follow your suggestions about thread pool and retry logic. For random IO, it is true that my implementation will not work well. It seems HADOOP-14535 is similar with what os does. Operation system starts to sequential read-ahead when one of the following conditions satisfies: * first read from a file and seek pos is 0 * current read and previous read are continuous in this file Otherwise, it is random IO. I will take a look at these two issues and continue to improve this. was (Author: wujinhu): Hi Steve Loughran Thanks for the comments and your suggestions are very helpful. I will follow your suggestions about thread pool and retry logic. For random IO, it is true that my implementation will not work well. It seems HADOOP-14535 is similar with what os does. Operation system starts to sequential read-ahead when one of the following conditions satisfies: * first read from a file and seek pos is 0 * current read and previous read are continuous in this file Otherwise, it is random IO. I will take a look at these two issues and continue to improve this. > Improvements for Hadoop read from AliyunOSS > --- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
wujinhu created HADOOP-15027: Summary: Improvements for Hadoop read from AliyunOSS Key: HADOOP-15027 URL: https://issues.apache.org/jira/browse/HADOOP-15027 Project: Hadoop Common Issue Type: Improvement Components: fs/oss Reporter: wujinhu Currently, read performance is poor when Hadoop reads from AliyunOSS. It needs about 1min to read 1GB from OSS. Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Component/s: fs/oss > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Fix Version/s: 3.0.0 > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Affects Version/s: 3.0.0-beta1 > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Fix Version/s: (was: 3.1.0) > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Fix Version/s: 2.9.1 3.1.0 > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Description: Currently, default number of times we should retry errors is 20, however, oss sdk retry delay is {code:java} long delay = (long)Math.pow(2, retries) * 0.3 {code} when one error occurs. So, if we retry 20 times, sleep time will be about 3.64 days and it is unacceptable. So we should change the default behavior. > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement >Reporter: wujinhu >Assignee: wujinhu > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Attachment: HADOOP-15104.001.patch > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > Attachments: HADOOP-15104.001.patch > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Attachment: (was: HADOOP-15104.001.patch) > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > Attachments: HADOOP-15104.001.patch > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283659#comment-16283659 ] wujinhu commented on HADOOP-15104: -- Attach for trunk > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > Attachments: HADOOP-15104.001.patch > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15104) AliyunOSS: change default max error retry
wujinhu created HADOOP-15104: Summary: AliyunOSS: change default max error retry Key: HADOOP-15104 URL: https://issues.apache.org/jira/browse/HADOOP-15104 Project: Hadoop Common Issue Type: Improvement Reporter: wujinhu Assignee: wujinhu -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15104: - Attachment: HADOOP-15104.001.patch > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > Attachments: HADOOP-15104.001.patch > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-15104) AliyunOSS: change default max error retry
[ https://issues.apache.org/jira/browse/HADOOP-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-15104 started by wujinhu. > AliyunOSS: change default max error retry > - > > Key: HADOOP-15104 > URL: https://issues.apache.org/jira/browse/HADOOP-15104 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.0, 2.9.1 > > Attachments: HADOOP-15104.001.patch > > > Currently, default number of times we should retry errors is 20, however, > oss sdk retry delay is > {code:java} > long delay = (long)Math.pow(2, retries) * 0.3 > {code} > when one error occurs. So, if we retry 20 times, sleep time will be about > 3.64 days and it is unacceptable. So we should change the default behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.006.patch > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential
wujinhu created HADOOP-15158: Summary: AliyunOSS: AliyunCredentialsProvider supports role based credential Key: HADOOP-15158 URL: https://issues.apache.org/jira/browse/HADOOP-15158 Project: Hadoop Common Issue Type: Improvement Components: fs/oss Affects Versions: 3.0.0 Reporter: wujinhu Fix For: 2.9.1, 3.0.1 Currently, AliyunCredentialsProvider supports credential by configuration(core-site.xml). Sometimes, admin wants to create different temporary credential(key/secret/token) for different roles so that one role cannot read data that belongs to another role. So, our code should support pass in the URI when creates an XXXCredentialsProvider. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.011.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317880#comment-16317880 ] wujinhu commented on HADOOP-15027: -- Attach patch to fix code style and FindBugs warning. > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-15158 started by wujinhu. > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15158.001.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: (was: HADOOP-15027.011.patch) > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.011.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: (was: HADOOP-15027.011.patch) > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.010.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869 ] wujinhu edited comment on HADOOP-15027 at 1/11/18 8:44 AM: --- Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query patch current query13.sql 241.591440.524 query28.sql 1259.307 1943.949 query51.sql 469.618722.904 query73.sql 216.596414.75 query96.sql 268.869476.473 {code} was (Author: wujinhu): Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591440.524 query28.sql 1259.307 1943.949 query51.sql 469.618722.904 query73.sql 216.596414.75 query96.sql 268.869476.473 {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869 ] wujinhu edited comment on HADOOP-15027 at 1/11/18 8:43 AM: --- Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591 440.524 query28.sql 1259.307 1943.949 query51.sql 469.618722.904 query73.sql 216.596414.75 query96.sql 268.869476.473 {code} was (Author: wujinhu): Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591 440.524 query28.sql 1259.307 1943.949 query51.sql 469.618 722.904 query73.sql 216.596 414.75 query96.sql 268.869 476.473 {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869 ] wujinhu commented on HADOOP-15027: -- Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591 440.524 query28.sql 1259.307 1943.949 query51.sql 469.618 722.904 query73.sql 216.596 414.75 query96.sql 268.869 476.473 {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869 ] wujinhu edited comment on HADOOP-15027 at 1/11/18 8:43 AM: --- Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591 440.524 query28.sql 1259.307 1943.949 query51.sql 469.618 722.904 query73.sql 216.596 414.75 query96.sql 268.869 476.473 {code} was (Author: wujinhu): Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591 440.524 query28.sql 1259.307 1943.949 query51.sql 469.618 722.904 query73.sql 216.596 414.75 query96.sql 268.869 476.473 {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869 ] wujinhu edited comment on HADOOP-15027 at 1/11/18 8:43 AM: --- Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591440.524 query28.sql 1259.307 1943.949 query51.sql 469.618722.904 query73.sql 216.596414.75 query96.sql 268.869476.473 {code} was (Author: wujinhu): Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query after before query13.sql 241.591 440.524 query28.sql 1259.307 1943.949 query51.sql 469.618722.904 query73.sql 216.596414.75 query96.sql 268.869476.473 {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321869#comment-16321869 ] wujinhu edited comment on HADOOP-15027 at 1/11/18 3:17 PM: --- Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version(text file). {code:java} query patch current query13.sql 241.591440.524 query28.sql 1259.307 1943.949 query51.sql 469.618722.904 query73.sql 216.596414.75 query96.sql 268.869476.473 {code} was (Author: wujinhu): Hi [~Sammi], here are some performance data. I use this tool(https://github.com/hortonworks/hive-testbench) to compare run time between this patch and current version. {code:java} query patch current query13.sql 241.591440.524 query28.sql 1259.307 1943.949 query51.sql 469.618722.904 query73.sql 216.596414.75 query96.sql 268.869476.473 {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: (was: HADOOP-15027.007.patch) > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Comment: was deleted (was: update default configure {code:java} --- a/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java +++ b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java @@ -101,14 +101,14 @@ private Constants() { public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY = "fs.oss.multipart.download.threads"; - public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10; + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16; public static final String MAX_TOTAL_TASKS = "fs.oss.max.total.tasks"; - public static final int DEFAULT_MAX_TOTAL_TASKS = 128; + public static final int DEFAULT_MAX_TOTAL_TASKS = 1024; public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY = "fs.oss.multipart.download.ahead.part.max.number"; - public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8; + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4; {code} ) > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309539#comment-16309539 ] wujinhu commented on HADOOP-15027: -- update default configure {code:java} --- a/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java +++ b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java @@ -101,14 +101,14 @@ private Constants() { public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY = "fs.oss.multipart.download.threads"; - public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10; + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16; public static final String MAX_TOTAL_TASKS = "fs.oss.max.total.tasks"; - public static final int DEFAULT_MAX_TOTAL_TASKS = 128; + public static final int DEFAULT_MAX_TOTAL_TASKS = 1024; public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY = "fs.oss.multipart.download.ahead.part.max.number"; - public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8; + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4; {code} > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Improvements for Hadoop read from AliyunOSS
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.007.patch > AliyunOSS: Improvements for Hadoop read from AliyunOSS > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Description: Currently, AliyunCredentialsProvider supports credential by configuration(core-site.xml). Sometimes, admin wants to create different temporary credential(key/secret/token) for different roles so that one role cannot read data that belongs to another role. So, our code should support pass in the URI when creates an XXXCredentialsProvider so that we can get user info(role) from the URI was: Currently, AliyunCredentialsProvider supports credential by configuration(core-site.xml). Sometimes, admin wants to create different temporary credential(key/secret/token) for different roles so that one role cannot read data that belongs to another role. So, our code should support pass in the URI when creates an XXXCredentialsProvider so that we can get user info from the URI > AliyunOSS: AliyunCredentialsProvider supports role based credential > > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu > Fix For: 2.9.1, 3.0.1 > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312621#comment-16312621 ] wujinhu commented on HADOOP-15158: -- Attach patch. However, should we add an implementation for this? > AliyunOSS: AliyunCredentialsProvider supports role based credential > > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 2.9.1, 3.0.1 > > Attachments: HADOOP-15158.001.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Summary: AliyunOSS: Supports role based credential (was: AliyunOSS: AliyunCredentialsProvider supports role based credential ) > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 2.9.1, 3.0.1 > > Attachments: HADOOP-15158.001.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Description: Currently, AliyunCredentialsProvider supports credential by configuration(core-site.xml). Sometimes, admin wants to create different temporary credential(key/secret/token) for different roles so that one role cannot read data that belongs to another role. So, our code should support pass in the URI when creates an XXXCredentialsProvider so that we can get user info from the URI was: Currently, AliyunCredentialsProvider supports credential by configuration(core-site.xml). Sometimes, admin wants to create different temporary credential(key/secret/token) for different roles so that one role cannot read data that belongs to another role. So, our code should support pass in the URI when creates an XXXCredentialsProvider. > AliyunOSS: AliyunCredentialsProvider supports role based credential > > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu > Fix For: 2.9.1, 3.0.1 > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu reassigned HADOOP-15158: Assignee: wujinhu > AliyunOSS: AliyunCredentialsProvider supports role based credential > > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 2.9.1, 3.0.1 > > Attachments: HADOOP-15158.001.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: AliyunCredentialsProvider supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Attachment: HADOOP-15158.001.patch > AliyunOSS: AliyunCredentialsProvider supports role based credential > > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu > Fix For: 2.9.1, 3.0.1 > > Attachments: HADOOP-15158.001.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311600#comment-16311600 ] wujinhu commented on HADOOP-15027: -- Updated the patch! Thanks [~Sammi] for the review. Here are my updates for your comments: 1 & 2 fixed 3. store.close() will not throw any exception, OSSClient.shutdown() catches Exception. 4. Yes, fs.oss.max.total.tasks is the max queue length used for read ahead. 5. fsDataInputStream.seek() is not missed, you can see it in 007.patch 6. I have added another test case for AliyunOSSFileReaderTask and will continue to add more test cases. Please help to review again. Thanks > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.009.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313216#comment-16313216 ] wujinhu commented on HADOOP-15027: -- Hi [~Sammi] Thanks for your review. I have attached some performance data, you can view the comments above. > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320103#comment-16320103 ] wujinhu edited comment on HADOOP-15027 at 1/10/18 11:28 AM: Change some default configuration. {code:java} - public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024; + public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024; public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY = "fs.oss.multipart.download.threads"; - public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16; + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10; public static final String MAX_TOTAL_TASKS_KEY = "fs.oss.max.total.tasks"; public static final int MAX_TOTAL_TASKS_DEFAULT = 128; public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY = "fs.oss.multipart.download.ahead.part.max.number"; - public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8; + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4; {code} was (Author: wujinhu): Change some default configuration. {code:java} 474c474 < index dd71842fb87..dedc038f3f7 100644 --- > index dd71842fb87..a1070277d33 100644 482c482 < + public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024; --- > + public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024; 486c486 < + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10; --- > + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16; 493c493 < + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4; --- > + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8; {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320103#comment-16320103 ] wujinhu edited comment on HADOOP-15027 at 1/10/18 11:32 AM: Change some default configurations. {code:java} - public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024; + public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024; public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY = "fs.oss.multipart.download.threads"; - public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16; + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10; public static final String MAX_TOTAL_TASKS_KEY = "fs.oss.max.total.tasks"; public static final int MAX_TOTAL_TASKS_DEFAULT = 128; public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY = "fs.oss.multipart.download.ahead.part.max.number"; - public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8; + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4; {code} was (Author: wujinhu): Change some default configuration. {code:java} - public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024; + public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024; public static final String MULTIPART_DOWNLOAD_THREAD_NUMBER_KEY = "fs.oss.multipart.download.threads"; - public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16; + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10; public static final String MAX_TOTAL_TASKS_KEY = "fs.oss.max.total.tasks"; public static final int MAX_TOTAL_TASKS_DEFAULT = 128; public static final String MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_KEY = "fs.oss.multipart.download.ahead.part.max.number"; - public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8; + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4; {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.012.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320103#comment-16320103 ] wujinhu commented on HADOOP-15027: -- Change some default configuration. {code:java} 474c474 < index dd71842fb87..dedc038f3f7 100644 --- > index dd71842fb87..a1070277d33 100644 482c482 < + public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 512 * 1024; --- > + public static final long MULTIPART_DOWNLOAD_SIZE_DEFAULT = 1024 * 1024; 486c486 < + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 10; --- > + public static final int MULTIPART_DOWNLOAD_THREAD_NUMBER_DEFAULT = 16; 493c493 < + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 4; --- > + public static final int MULTIPART_DOWNLOAD_AHEAD_PART_MAX_NUM_DEFAULT = 8; {code} > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.011.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Attachment: HADOOP-15158.004.patch > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Fix For: 2.9.1, 3.0.1 > > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, > HADOOP-15158.003.patch, HADOOP-15158.004.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326830#comment-16326830 ] wujinhu commented on HADOOP-15027: -- Thanks [~Sammi] for the review. I have updated the patch.:) > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch, HADOOP-15027.013.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.013.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch, HADOOP-15027.013.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance
[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15027: - Attachment: HADOOP-15027.014.patch > AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to > Aliyun OSS performance > -- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, > HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, > HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, > HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, > HADOOP-15027.012.patch, HADOOP-15027.013.patch, HADOOP-15027.014.patch > > > Currently, AliyunOSSInputStream uses single thread to read data from > AliyunOSS, so we can do some refactoring by using multi-thread pre-read to > improve read performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319790#comment-16319790 ] wujinhu commented on HADOOP-15158: -- Attach patch! I add a test to cover those changed code lines. > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.1 > > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, > HADOOP-15158.003.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Fix Version/s: 3.0.1 Status: Patch Available (was: In Progress) > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.1 > > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Attachment: HADOOP-15158.003.patch > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.1 > > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, > HADOOP-15158.003.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Fix Version/s: 2.9.1 > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 2.9.1, 3.0.1 > > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, > HADOOP-15158.003.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15158: - Attachment: HADOOP-15158.002.patch > AliyunOSS: Supports role based credential > - > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu > Fix For: 3.0.1 > > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org