[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2034: - Resolution: Fixed Fix Version/s: 0.23.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I have committed this. Thanks, John. Also thanks Daryn for reviewing it. length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Fix For: 0.23.0 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034-5.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2034: - Component/s: hdfs client length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Reporter: John George Assignee: John George Priority: Minor Fix For: 0.23.0 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034-5.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-2034: -- Status: Patch Available (was: Open) length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-2034: -- Attachment: HDFS-2034-4.patch New patch with Tod's and Daryn's comments included. length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2034: -- Status: Open (was: Patch Available) length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, HDFS-2034-3.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-2034: -- Attachment: HDFS-2034-1.patch Daryn Sharp commented on HDFS-2034: --- * I'd suggest changing {{lastblkcomplete but read past that}} to something more grammatically correct (I'm not the police!) like {{can't read past last completed block}}. Good point. * There's an assertion for reading past the last *completed* block, but not for reading past the last *located* block? The reason that I added assert for *completed* blocks is because I was changing a previous assumption in that code. So, I wanted to make sure that I was not breaking the logic behind it. I should probably remove this assert. The reason I am hesitant about adding an assert for reading past the last block is because it has to go out of its way to calculate the length of all the blocks (which is already being done by the caller of this function) and that does not seem to be the job of this function. * I found the names of the booleans to not be clear at face value. Ie. I couldn't tell what they meant if I didn't see the assignment. On reading the booleans today, I have to say I agree with you on two of the three boolean names. Changed readOnlyIncompleteBlk to readOffsetWithinCompleteBlk (now it means the opposite of what it meant before, but it seems to help the code logic). Changed readPastCompletedBlk to readLengthPastCompletedBlock. I would like to continue to use lengthOfCompleteBlk instead of locatedBlocks.getFileLength() because (and as Todd suggested in HDFS-1907), it make sense to call it something related to completeBlocks as opposed to FileLength. For your consideration, maybe something like this with a sprinkling of comments: {code} if (locatedBlocks.isLastBlockComplete()) assert(!readOffsetPastCompletedBlock) : can't read past last completed block; if (!readOffsetPastCompletedBlock) { if (readLengthPastCompletedBlock) { length = locatedBlocks.getFileLength() - offset; } blocks = getFinalizedBlockRange(offset, length); } if (readOffsetPastCompletedBlock) { Block lastBlock = locatedBlocks.getLastLocatedBlock(); long lastBlockEndOffset = lastBlock.getStartOffset() + lastBlock.getBlockSize(); assert(offset lastBlockEndOffset) : can't read past last located block; blocks.add(lastLocatedBlock); } {code} Thanks for your suggestions. length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-2034: -- Attachment: HDFS-2034-1.patch Daryn Sharp commented on HDFS-2034: --- * I'd suggest changing {{lastblkcomplete but read past that}} to something more grammatically correct (I'm not the police!) like {{can't read past last completed block}}. Good point. * There's an assertion for reading past the last *completed* block, but not for reading past the last *located* block? The reason that I added assert for *completed* blocks is because I was changing a previous assumption in that code. So, I wanted to make sure that I was not breaking the logic behind it. I should probably remove this assert. The reason I am hesitant about adding an assert for reading past the last block is because it has to go out of its way to calculate the length of all the blocks (which is already being done by the caller of this function) and that does not seem to be the job of this function. * I found the names of the booleans to not be clear at face value. Ie. I couldn't tell what they meant if I didn't see the assignment. On reading the booleans today, I have to say I agree with you on two of the three boolean names. Changed readOnlyIncompleteBlk to readOffsetWithinCompleteBlk (now it means the opposite of what it meant before, but it seems to help the code logic). Changed readPastCompletedBlk to readLengthPastCompletedBlock. I would like to continue to use lengthOfCompleteBlk instead of locatedBlocks.getFileLength() because (and as Todd suggested in HDFS-1907), it make sense to call it something related to completeBlocks as opposed to FileLength. For your consideration, maybe something like this with a sprinkling of comments: {code} if (locatedBlocks.isLastBlockComplete()) assert(!readOffsetPastCompletedBlock) : can't read past last completed block; if (!readOffsetPastCompletedBlock) { if (readLengthPastCompletedBlock) { length = locatedBlocks.getFileLength() - offset; } blocks = getFinalizedBlockRange(offset, length); } if (readOffsetPastCompletedBlock) { Block lastBlock = locatedBlocks.getLastLocatedBlock(); long lastBlockEndOffset = lastBlock.getStartOffset() + lastBlock.getBlockSize(); assert(offset lastBlockEndOffset) : can't read past last located block; blocks.add(lastLocatedBlock); } {code} Thanks for your suggestions. length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-2034: -- Attachment: HDFS-2034-2.patch Good point about the incomplete block being added to an offset larger than file size. I completely agree with you that the function should ensure what it sends back is what it was asked for. In that case, we should probably just check for the requested offset. My concern with that specific assert is that, with that assert, this function controls the behavior in cases where the user requested for an offset EOF, while the upper layer might specifically want a different behavior (like sending back an EOF exception or just returning 0). So, my take on this is that as long as the function behaves like it is asked to, we should probably not assert. Adding a new patch based on your suggestion about getFileLength(). Thanks again Daryn. length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-2034: -- Attachment: HDFS-2034-3.patch Thanks Todd. Attaching a patch with the following changes. - added assert to check if offset file length - moved blocks = new ArrayList... to an else length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, HDFS-2034-3.patch, HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
[ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-2034: -- Priority: Minor (was: Major) Summary: length in getBlockRange becomes -ve when reading only from currently being written blk (was: reading from currently being written block does not work ) length in getBlockRange becomes -ve when reading only from currently being written blk -- Key: HDFS-2034 URL: https://issues.apache.org/jira/browse/HDFS-2034 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Priority: Minor Attachments: HDFS-2034.patch This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that brought out this issue. {quote} Here's an example sequence to describe what I mean: 1. open file, write one and a half blocks 2. call hflush 3. another reader asks for the first byte of the second block {quote} In this case since offset is greater than the completed block length, the math in getBlockRange() of DFSInputStreamer.java will set length to negative. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira