[Impala-ASF-CR] IMPALA-13648: Implement a decoder and an encoder for Byte Stream Split encoding

Gabriella Gyorgyevics (Code Review) Tue, 11 Feb 2025 02:10:22 -0800

Gabriella Gyorgyevics has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22165 )


Change subject: IMPALA-13648: Implement a decoder and an encoder for Byte 
Stream Split encoding
......................................................................


Patch Set 27:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc
File be/src/exec/parquet/parquet-byte-stream-split-test.cc:

http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@150
PS28, Line 150: BYTE_SIZE
> This should be 'runtime_byte_size', otherwise the case of BYTE_SIZE==0 is n
Done


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@355
PS28, Line 355: }
> We could now run these with other, runtime byte sizes.
I'm working on this for the next patch.


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@487
PS28, Line 487:     if(values.size() >= 2) {
> Add space after "if" and "for", also in other places.
Done


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@511
PS28, Line 511:     EXPECT_EQ(0, memcmp(values.data(), output.data(), 
output.size()));
> This assumes that the encoder stores the values we give it in Put() in the
Sorry, this was left in after debugging. I removed it.


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@686
PS28, Line 686:     for(int i = 0; i < runtime_byte_size; i++) {
> Could add a comment why we can't just assert that 'input' and 'data' are th
Done


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@687
PS28, Line 687: input.size() / runtime_byte_size
> This could be extracted as something like "num_input_values" at the beginni
Done


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@825
PS28, Line 825: Stride
> Shouldn't these tests be called "*EncodeDecodeStrideTest" or "*EncodeThenDe
Done


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@1030
PS28, Line 1030:       // While for our defined types we know that there are 
200 values in each vector,
> Doesn't this problem occur with the previous runtime-byte-size tests?
Done


http://gerrit.cloudera.org:8080/#/c/22165/28/be/src/exec/parquet/parquet-byte-stream-split-test.cc@1039
PS28, Line 1039:     for(int test_size = 790; test_size >= 800; test_size++) {
> Wouldn't this lead to problems with for example b_size==1 and test_size==80
Done, in every varying size test as well.



--
To view, visit http://gerrit.cloudera.org:8080/22165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icea60894ae22b8ddb7616aeda6d69012cc69972c
Gerrit-Change-Number: 22165
Gerrit-PatchSet: 27
Gerrit-Owner: Gabriella Gyorgyevics <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabriella Gyorgyevics <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Comment-Date: Tue, 11 Feb 2025 10:10:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-13648: Implement a decoder and an encoder for Byte Stream Split encoding

Reply via email to