[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511973#comment-14511973
 ] 

Jing Zhao edited comment on HDFS-8033 at 4/24/15 11:11 PM:
---

Thanks Zhe! The 006 patch looks pretty good to me. +1

For the testing, the stateful read test should also cover reading with 
ByteBuffer case. Please see if you want to do it here or in a separate jira.


was (Author: jingzhao):
Thanks Zhe! The 006 patch looks pretty good to me. +1

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: ByteBufferStrategy_support_len.patch, 
 HDFS-8033.000.patch, HDFS-8033.001.patch, HDFS-8033.002.patch, 
 HDFS-8033.003.patch, HDFS-8033.004.patch, HDFS-8033.005.patch, 
 HDFS-8033.006.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428
 ] 

Yi Liu edited comment on HDFS-8033 at 4/21/15 6:25 AM:
---

Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output 
*buf*,  {{readWithStrategy}} will call {{readBuffer}} and return on success.  
In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in 
one striped block, so the returned result should be something like (cell_0, 
cell_3, ).  
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the 
data size large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?


was (Author: hitliuyi):
Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.* 
In DFSInputStream, the stateful read is not to read fully for the output *buf*, 
 {{readWithStrategy}} will call {{readBuffer}} and return on success.  In 
{{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one 
striped block, so the returned result should be something like (cell_0, cell_3, 
).  
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.* 
For the test, do stateful read: read once and fully read (please make the data 
size large than groupSize * cellSize), as I said in #1,

*4.* 
{{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.* 
Why you modify {{SimulatedFSDataset}}?

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428
 ] 

Yi Liu edited comment on HDFS-8033 at 4/21/15 6:33 AM:
---

Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output 
*buf*,  {{readWithStrategy}} will call {{readBuffer}} and return on success.  
In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in 
one striped block, so the returned result should be something like (cell_0, 
cell_3, ) and it only contains part of the expected data. 
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the 
data size large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?


was (Author: hitliuyi):
Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output 
*buf*,  {{readWithStrategy}} will call {{readBuffer}} and return on success.  
In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in 
one striped block, so the returned result should be something like (cell_0, 
cell_3, ).  
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the 
data size large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506131#comment-14506131
 ] 

Yi Liu edited comment on HDFS-8033 at 4/22/15 12:42 AM:


Thanks Zhe for the response, and also thanks Jing for the review and good 
comments!   
Yes, it's a good idea to maintain a list of current DataNodes.

{quote}
DFSStripedInputStream#readBuffer does switch the blockReader. So after reading 
cell_0, we'll switch to the next blockReader and read cell_1.
{quote}
{{DFSStripedInputStream#readBuffer}} can switch the blockReader, that's true, 
but it depends on the caller to call, currently for one stateful read, 
{{DFSInputStream#readWithStrategy}} only call it one time, so the result only 
contains part of the expected data.
I didn't see you do loop (parallelly) all blockReaders for the striped block in 
{{DFSStripedInputStream#readBuffer}}.

I have another comments, 
I think we need to change {{closeCurrentBlockReader}} to 
{{closeCurrentBlockReaders}} in {{DFSInputStream}} and override it in 
{{DFSStripedInputStream}}, otherwise there is leak for {{blockReaders}}.


was (Author: hitliuyi):
Thanks Zhe for the response, and also thanks Jing for the review and good 
comments!   
Yes, it's a good idea to maintain a list of current DataNodes.

{quote}
DFSStripedInputStream#readBuffer does switch the blockReader. So after reading 
cell_0, we'll switch to the next blockReader and read cell_1.
{quote}
{{DFSStripedInputStream#readBuffer}} can switch the blockReader, that's true, 
but it depends on the caller to call, currently for one stateful read, 
{{DFSInputStream#readWithStrategy}} only call it one time, so the result only 
contains part of the expected data.
I didn't see you do loop (parallelly) all blockReaders for the striped block in 
{{DFSStripedInputStream#readBuffer}}.

I have another comments, 
I think we need to change {{closeCurrentBlockReader}} to 
{{closeCurrentBlockReaders}} in {{DFSInputStream}} and overwrite it in 
{{DFSStripedInputStream}}, otherwise there is leak for {{blockReaders}}.

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)