[jira] [Commented] (HDFS-11542) Fix RawErasureCoderBenchmark decoding operation

Shilun Fan (Jira) Wed, 03 Jan 2024 23:42:06 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802507#comment-17802507
 ]


Shilun Fan commented on HDFS-11542:
-----------------------------------

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Fix RawErasureCoderBenchmark decoding operation
> -----------------------------------------------
>
>                 Key: HDFS-11542
>                 URL: https://issues.apache.org/jira/browse/HDFS-11542
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha2
>            Reporter: László Bence Nagy
>            Priority: Minor
>              Labels: test
>
> There are some issues with the decode operation in the 
> *RawErasureCoderBenchmark.java* file. The decoding method is called like 
> this: *decoder.decode(decodeInputs, ERASED_INDEXES, outputs);*. 
> Using RS 6+3 configuration it could be called with these parameters correctly 
> like this: *decode([ d0, NULL, d2, d3, NULL, d5, p0, NULL, p2 ], [ 1, 4, 7 ], 
> [ -, -, - ])*. The 1,4,7 indexes are in the *ERASED_INDEXES* array so in the 
> *decodeInputs* array the values at those indexes are set to NULL, all other 
> data and parity packets are present in the array. The *outputs* array's 
> length is 3, where the d1, d4 and p1 packets should be reconstructed. This 
> would be the right solution.
> Right now this example would be called like this: *decode([ d0, d1, d2, d3, 
> d4, d5, -, -, - ], [ 1, 4, 7 ], [ -, -, - ])*. So it has two main problems 
> with the *decodeInputs* array. Firstly, the packets are not set to NULL where 
> they should be based on the *ERASED_INDEXES* array. Secondly, it does not 
> have any parity packets for decoding.
> The first problem is easy to solve, the values at the proper indexes need to 
> be set to NULL. The latter one is a little harder because right now multiple 
> rounds of encode operations are done one after another and similarly multiple 
> decode operations are called one by one. Encode and decode pairs should be 
> called one after another so that the encoded parity packets can be used in 
> the *decodeInputs* array as a parameter for decode. (Of course, their 
> performance should be still measured separately.)
> Moreover, there is one more problem in this file. Right now it works with RS 
> 6+3 and the *ERASED_INDEXES* array is fixed to *[ 6, 7, 8 ]*. So the three 
> parity packets are needed to be reconstructed. This means that no real decode 
> performance is measured because no data packet is needed to be reconstructed 
> (even if the decode works properly). Actually, only new parity packets are 
> needed to be encoded. The exact implementation depends on the underlying 
> erasure coding plugin, but the point is that data packets should also be 
> erased to measure real decode performance.
> In addition to this, more RS configurations (not just 6+3) could be measured 
> as well to be able to compare them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11542) Fix RawErasureCoderBenchmark decoding operation

Reply via email to