[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511973#comment-14511973 ] Jing Zhao edited comment on HDFS-8033 at 4/24/15 11:11 PM: --- Thanks Zhe! The 006 patch looks pretty good to me. +1 For the testing, the stateful read test should also cover reading with ByteBuffer case. Please see if you want to do it here or in a separate jira. was (Author: jingzhao): Thanks Zhe! The 006 patch looks pretty good to me. +1 Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: ByteBufferStrategy_support_len.patch, HDFS-8033.000.patch, HDFS-8033.001.patch, HDFS-8033.002.patch, HDFS-8033.003.patch, HDFS-8033.004.patch, HDFS-8033.005.patch, HDFS-8033.006.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428 ] Yi Liu edited comment on HDFS-8033 at 4/21/15 6:25 AM: --- Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ). This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? was (Author: hitliuyi): Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ). This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428 ] Yi Liu edited comment on HDFS-8033 at 4/21/15 6:33 AM: --- Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ) and it only contains part of the expected data. This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? was (Author: hitliuyi): Thanks [~zhz] for working on this. The patch is good, my comments: *1.* In DFSInputStream, the stateful read is not to read fully for the output *buf*, {{readWithStrategy}} will call {{readBuffer}} and return on success. In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one striped block, so the returned result should be something like (cell_0, cell_3, ). This is not incorrect, in the test, you have tested stateful read, but you do fully read and the data size is *BLOCK_GROUP_SIZE*, so the result coincidentally is correct. I suggest we try to do fully read in {{readBuffer}} of {{DFSStripedInputStream}} unless we find the end of file, of course, the final read length could be less than the input buf length if we get eof. *2.* In {{blockSeekTo}}, we need to handle refetchToken and refetchEncryptionKey. And for other IOException, we can throw it. *3.* For the test, do stateful read: read once and fully read (please make the data size large than groupSize * cellSize), as I said in #1, *4.* {{connectFailedOnce}} in {{blockSeekTo}} is not necessary. *5.* Why you modify {{SimulatedFSDataset}}? Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout
[ https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506131#comment-14506131 ] Yi Liu edited comment on HDFS-8033 at 4/22/15 12:42 AM: Thanks Zhe for the response, and also thanks Jing for the review and good comments! Yes, it's a good idea to maintain a list of current DataNodes. {quote} DFSStripedInputStream#readBuffer does switch the blockReader. So after reading cell_0, we'll switch to the next blockReader and read cell_1. {quote} {{DFSStripedInputStream#readBuffer}} can switch the blockReader, that's true, but it depends on the caller to call, currently for one stateful read, {{DFSInputStream#readWithStrategy}} only call it one time, so the result only contains part of the expected data. I didn't see you do loop (parallelly) all blockReaders for the striped block in {{DFSStripedInputStream#readBuffer}}. I have another comments, I think we need to change {{closeCurrentBlockReader}} to {{closeCurrentBlockReaders}} in {{DFSInputStream}} and override it in {{DFSStripedInputStream}}, otherwise there is leak for {{blockReaders}}. was (Author: hitliuyi): Thanks Zhe for the response, and also thanks Jing for the review and good comments! Yes, it's a good idea to maintain a list of current DataNodes. {quote} DFSStripedInputStream#readBuffer does switch the blockReader. So after reading cell_0, we'll switch to the next blockReader and read cell_1. {quote} {{DFSStripedInputStream#readBuffer}} can switch the blockReader, that's true, but it depends on the caller to call, currently for one stateful read, {{DFSInputStream#readWithStrategy}} only call it one time, so the result only contains part of the expected data. I didn't see you do loop (parallelly) all blockReaders for the striped block in {{DFSStripedInputStream#readBuffer}}. I have another comments, I think we need to change {{closeCurrentBlockReader}} to {{closeCurrentBlockReaders}} in {{DFSInputStream}} and overwrite it in {{DFSStripedInputStream}}, otherwise there is leak for {{blockReaders}}. Erasure coding: stateful (non-positional) read from files in striped layout --- Key: HDFS-8033 URL: https://issues.apache.org/jira/browse/HDFS-8033 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)