[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-07-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2034:
-

   Resolution: Fixed
Fix Version/s: 0.23.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, John.

Also thanks Daryn for reviewing it.

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
 HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034-5.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-07-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2034:
-

Component/s: hdfs client

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Reporter: John George
Assignee: John George
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
 HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034-5.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-20 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Status: Patch Available  (was: Open)

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
 HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-18 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Attachment: HDFS-2034-4.patch

New patch with Tod's and Daryn's comments included.

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
 HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2034:
--

Status: Open  (was: Patch Available)

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
 HDFS-2034-3.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-06 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Attachment: HDFS-2034-1.patch


 Daryn Sharp commented on HDFS-2034:
 ---
 
 * I'd suggest changing {{lastblkcomplete but read past that}} to something 
 more grammatically correct (I'm not the police!) like {{can't read past last 
 completed block}}.

Good point.

 
 * There's an assertion for reading past the last *completed* block, but not 
 for reading past the last *located* block?

The reason that I added assert for *completed* blocks is because I was changing 
a previous assumption in that code. So, I wanted to make sure that I was not 
breaking the logic behind it. I should probably remove this assert.

The reason I am hesitant about adding an assert for reading past the last block 
is because it has to go out of its way to calculate the length of all the 
blocks (which is already being done by the caller of this function) and that 
does not seem to be the job of this function.

 
 * I found the names of the booleans to not be clear at face value.  Ie. I 
 couldn't tell what they meant if I didn't see the assignment.

On reading the booleans today, I have to say I agree with you on two of the 
three boolean names.

Changed readOnlyIncompleteBlk to readOffsetWithinCompleteBlk (now it means the 
opposite of what it meant before, but it seems to help the code logic).

Changed readPastCompletedBlk to readLengthPastCompletedBlock.
I would like to continue to use lengthOfCompleteBlk instead of 
locatedBlocks.getFileLength() because (and as Todd suggested in HDFS-1907), it 
make sense to call it something related to completeBlocks as opposed to 
FileLength.

 
 For your consideration, maybe something like this with a sprinkling of 
 comments:
 {code}
 if (locatedBlocks.isLastBlockComplete())
   assert(!readOffsetPastCompletedBlock) : can't read past last completed 
 block;
 
 if (!readOffsetPastCompletedBlock) {
   if (readLengthPastCompletedBlock) {
 length = locatedBlocks.getFileLength() - offset;
   }
   blocks = getFinalizedBlockRange(offset, length);
 }
 if (readOffsetPastCompletedBlock) {
   Block lastBlock = locatedBlocks.getLastLocatedBlock();
   long lastBlockEndOffset = lastBlock.getStartOffset() + 
 lastBlock.getBlockSize();
   assert(offset  lastBlockEndOffset) : can't read past last located block;
   blocks.add(lastLocatedBlock);
 }
 {code}
 

Thanks for your suggestions.

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-06 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Attachment: HDFS-2034-1.patch


 Daryn Sharp commented on HDFS-2034:
 ---
 
 * I'd suggest changing {{lastblkcomplete but read past that}} to something 
 more grammatically correct (I'm not the police!) like {{can't read past last 
 completed block}}.

Good point.

 
 * There's an assertion for reading past the last *completed* block, but not 
 for reading past the last *located* block?

The reason that I added assert for *completed* blocks is because I was changing 
a previous assumption in that code. So, I wanted to make sure that I was not 
breaking the logic behind it. I should probably remove this assert.

The reason I am hesitant about adding an assert for reading past the last block 
is because it has to go out of its way to calculate the length of all the 
blocks (which is already being done by the caller of this function) and that 
does not seem to be the job of this function.

 
 * I found the names of the booleans to not be clear at face value.  Ie. I 
 couldn't tell what they meant if I didn't see the assignment.

On reading the booleans today, I have to say I agree with you on two of the 
three boolean names.

Changed readOnlyIncompleteBlk to readOffsetWithinCompleteBlk (now it means the 
opposite of what it meant before, but it seems to help the code logic).

Changed readPastCompletedBlk to readLengthPastCompletedBlock.
I would like to continue to use lengthOfCompleteBlk instead of 
locatedBlocks.getFileLength() because (and as Todd suggested in HDFS-1907), it 
make sense to call it something related to completeBlocks as opposed to 
FileLength.

 
 For your consideration, maybe something like this with a sprinkling of 
 comments:
 {code}
 if (locatedBlocks.isLastBlockComplete())
   assert(!readOffsetPastCompletedBlock) : can't read past last completed 
 block;
 
 if (!readOffsetPastCompletedBlock) {
   if (readLengthPastCompletedBlock) {
 length = locatedBlocks.getFileLength() - offset;
   }
   blocks = getFinalizedBlockRange(offset, length);
 }
 if (readOffsetPastCompletedBlock) {
   Block lastBlock = locatedBlocks.getLastLocatedBlock();
   long lastBlockEndOffset = lastBlock.getStartOffset() + 
 lastBlock.getBlockSize();
   assert(offset  lastBlockEndOffset) : can't read past last located block;
   blocks.add(lastLocatedBlock);
 }
 {code}
 

Thanks for your suggestions.

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-06 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Attachment: HDFS-2034-2.patch

Good point about the incomplete block being added to an offset larger than 
file size. I completely agree with you that the function should ensure what it 
sends back is what it was asked for. In that case, we should probably just 
check for the requested offset. My concern with that specific assert is that, 
with that assert, this function controls the behavior in cases where the user 
requested for an offset  EOF, while the upper layer might specifically want a 
different behavior (like sending back an EOF exception or just returning 0). 
So, my take on this is that as long as the function behaves like it is asked 
to, we should probably not assert. 

Adding a new patch based on your suggestion about getFileLength().

Thanks again Daryn.

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
 HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-06 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Attachment: HDFS-2034-3.patch

Thanks Todd. Attaching a patch with the following changes.

- added assert to check if offset  file length

- moved blocks = new ArrayList... to an else


 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
 HDFS-2034-3.patch, HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-04 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Priority: Minor  (was: Major)
 Summary: length in getBlockRange becomes -ve when reading only from 
currently being written blk  (was: reading from currently being written block 
does not work )

 length in getBlockRange becomes -ve when reading only from currently being 
 written blk
 --

 Key: HDFS-2034
 URL: https://issues.apache.org/jira/browse/HDFS-2034
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: John George
Assignee: John George
Priority: Minor
 Attachments: HDFS-2034.patch


 This came up during HDFS-1907. Posting an example that Todd posted in 
 HDFS-1907 that brought out this issue.
 {quote}
 Here's an example sequence to describe what I mean:
 1. open file, write one and a half blocks
 2. call hflush
 3. another reader asks for the first byte of the second block
 {quote}
 In this case since offset is greater than the completed block length, the 
 math in getBlockRange() of DFSInputStreamer.java will set length to 
 negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira