[jira] [Updated] (HDFS-8233) Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is at the block group boundary
[ https://issues.apache.org/jira/browse/HDFS-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8233: -- Hadoop Flags: Reviewed +1 patch looks good. > Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is > at the block group boundary > > > Key: HDFS-8233 > URL: https://issues.apache.org/jira/browse/HDFS-8233 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: hdfs8233-HDFS-7285.000.patch > > > Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses > {{getBytesCurBlock}} of each streamer to calculate the block group size. This > is wrong when the last stripe is at the block group boundary, since the > {{bytesCurBlock}} is set to 0 if an internal block is finished. > For example, when the real block size is {{blockGroupSize - cellSize * > (numDataBlocks - 1)}}, i.e., the first internal block is full while the > others are not, the {{getCurrentBlockGroupBytes}} returns wrong result and > cause the write to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8233) Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is at the block group boundary
[ https://issues.apache.org/jira/browse/HDFS-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8233: Description: Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses {{getBytesCurBlock}} of each streamer to calculate the block group size. This is wrong when the last stripe is at the block group boundary, since the {{bytesCurBlock}} is set to 0 if an internal block is finished. For example, when the real block size is {{blockGroupSize - cellSize * (numDataBlocks - 1)}}, i.e., the first internal block is full while the others are not, the {{getCurrentBlockGroupBytes}} returns wrong result and cause the write to fail. was:Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses {{getBytesCurBlock}} of each streamer to calculate the block group size. This is wrong when the last stripe is at the block group boundary, since the {{bytesCurBlock}} is set to 0 if an internal block is finished. > Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is > at the block group boundary > > > Key: HDFS-8233 > URL: https://issues.apache.org/jira/browse/HDFS-8233 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: hdfs8233-HDFS-7285.000.patch > > > Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses > {{getBytesCurBlock}} of each streamer to calculate the block group size. This > is wrong when the last stripe is at the block group boundary, since the > {{bytesCurBlock}} is set to 0 if an internal block is finished. > For example, when the real block size is {{blockGroupSize - cellSize * > (numDataBlocks - 1)}}, i.e., the first internal block is full while the > others are not, the {{getCurrentBlockGroupBytes}} returns wrong result and > cause the write to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8233) Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is at the block group boundary
[ https://issues.apache.org/jira/browse/HDFS-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8233: Attachment: hdfs8233-HDFS-7285.000.patch When we call {{getCurrentBlockGroupBytes}}, the block object in each streamer cannot reflect the real size since we cannot guarantee that all the packets have been sent out and acks have also been received. The {{bytesCurBlock}} field can be set to 0 when an internal block is full. Thus it is hard to compute the accurate block size at this time. However, for {{writeParityCellsForLastStripe}} what we need is only the parity cell size which can be computed based on {{bytesCurBlock}}. The 000 patch fixes the issue based on the above theory. It also adds a new unit test case which fails with original code. > Fix DFSStripedOutputStream#getCurrentBlockGroupBytes when the last stripe is > at the block group boundary > > > Key: HDFS-8233 > URL: https://issues.apache.org/jira/browse/HDFS-8233 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: hdfs8233-HDFS-7285.000.patch > > > Currently {{DFSStripedOutputStream#getCurrentBlockGroupBytes}} simply uses > {{getBytesCurBlock}} of each streamer to calculate the block group size. This > is wrong when the last stripe is at the block group boundary, since the > {{bytesCurBlock}} is set to 0 if an internal block is finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)