[jira] [Updated] (HDFS-8233) Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is at the block group boundary

2015-04-23 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8233:
--
Hadoop Flags: Reviewed

+1 patch looks good.

> Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is 
> at the block group boundary
> 
>
> Key: HDFS-8233
> URL: https://issues.apache.org/jira/browse/HDFS-8233
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: hdfs8233-HDFS-7285.000.patch
>
>
> Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses 
> {{getBytesCurBlock}} of each streamer to calculate the block group size. This 
> is wrong when the last stripe is at the block group boundary, since the 
> {{bytesCurBlock}} is set to 0 if an internal block is finished.
> For example, when the real block size is {{blockGroupSize - cellSize * 
> (numDataBlocks - 1)}}, i.e., the first internal block is full while the 
> others are not, the {{getCurrentBlockGroupBytes}} returns wrong result and 
> cause the write to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8233) Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is at the block group boundary

2015-04-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8233:

Description: 
Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses 
{{getBytesCurBlock}} of each streamer to calculate the block group size. This 
is wrong when the last stripe is at the block group boundary, since the 
{{bytesCurBlock}} is set to 0 if an internal block is finished.

For example, when the real block size is {{blockGroupSize - cellSize * 
(numDataBlocks - 1)}}, i.e., the first internal block is full while the others 
are not, the {{getCurrentBlockGroupBytes}} returns wrong result and cause the 
write to fail.

  was:Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply 
uses {{getBytesCurBlock}} of each streamer to calculate the block group size. 
This is wrong when the last stripe is at the block group boundary, since the 
{{bytesCurBlock}} is set to 0 if an internal block is finished. 


> Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is 
> at the block group boundary
> 
>
> Key: HDFS-8233
> URL: https://issues.apache.org/jira/browse/HDFS-8233
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: hdfs8233-HDFS-7285.000.patch
>
>
> Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses 
> {{getBytesCurBlock}} of each streamer to calculate the block group size. This 
> is wrong when the last stripe is at the block group boundary, since the 
> {{bytesCurBlock}} is set to 0 if an internal block is finished.
> For example, when the real block size is {{blockGroupSize - cellSize * 
> (numDataBlocks - 1)}}, i.e., the first internal block is full while the 
> others are not, the {{getCurrentBlockGroupBytes}} returns wrong result and 
> cause the write to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8233) Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is at the block group boundary

2015-04-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8233:

Attachment: hdfs8233-HDFS-7285.000.patch

When we call {{getCurrentBlockGroupBytes}}, the block object in each streamer 
cannot reflect the real size since we cannot guarantee that all the packets 
have been sent out and acks have also been received. The {{bytesCurBlock}} 
field can be set to 0 when an internal block is full. Thus it is hard to 
compute the accurate block size at this time. However, for 
{{writeParityCellsForLastStripe}} what we need is only the parity cell size 
which can be computed based on {{bytesCurBlock}}.

The 000 patch fixes the issue based on the above theory. It also adds a new 
unit test case which fails with original code.

> Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is 
> at the block group boundary
> 
>
> Key: HDFS-8233
> URL: https://issues.apache.org/jira/browse/HDFS-8233
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: hdfs8233-HDFS-7285.000.patch
>
>
> Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses 
> {{getBytesCurBlock}} of each streamer to calculate the block group size. This 
> is wrong when the last stripe is at the block group boundary, since the 
> {{bytesCurBlock}} is set to 0 if an internal block is finished. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)