Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
......................................................................


Patch Set 20:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc
File be/src/benchmarks/parquet-delta-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc@1060
PS19, Line 1060:
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc
File be/src/exec/parquet/parquet-delta-decoder.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@181
PS19, Line 181:   if (!SKIP) {
> nit: +2 indentation
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@298
PS19, Line 298:   if (UNLIKELY(!min_delta_read)) return false;
> We may be able to remove most int64_t ones, see my comment in bit-packing.c
Done, also removed the unsigned instantiations as Impala has only signed types.


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h
File be/src/exec/parquet/parquet-delta-encoder.h:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h@54
PS19, Line 54:     static constexpr int DEFAULT_MAX_TOTAL_VALUE_COUNT
> Note that there is a query option for setting max row count in page: parque
I introduced a new parameter to Init(), 'max_page_value_count', which can be 
used to set the value to 'parquet_page_row_count_limit' when we integrate the 
encoder with the Parquet writer.

Left 16000 as a default.


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc
File be/src/exec/parquet/parquet-delta-encoder.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@117
PS19, Line 117:   const int data_len = output_buffer_pos_ - 
old_data_start_address;
> Can you skip this if reserved_space_for_header_ == actual_header_size? Even
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@289
PS19, Line 289:     (delta_buffer_.size() + miniblock_size_in_values_ - 1) / 
miniblock_size_in_values_;
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc
File be/src/util/bit-packing.cc:

http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc@75
PS19, Line 75:
             : // Required for bit-packing-benchmark.cc.
             : template
> Do we need these combinations?
Done



--
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 20
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 22 May 2023 17:24:13 +0000
Gerrit-HasComments: Yes

Reply via email to