Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12621 )
Change subject: IMPALA-8253: Parquet delta encoding and decoding. ...................................................................... Patch Set 20: (7 comments) http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc File be/src/benchmarks/parquet-delta-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/benchmarks/parquet-delta-benchmark.cc@1060 PS19, Line 1060: > nit: long line Done http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc File be/src/exec/parquet/parquet-delta-decoder.cc: http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@181 PS19, Line 181: if (!SKIP) { > nit: +2 indentation Done http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-decoder.cc@298 PS19, Line 298: if (UNLIKELY(!min_delta_read)) return false; > We may be able to remove most int64_t ones, see my comment in bit-packing.c Done, also removed the unsigned instantiations as Impala has only signed types. http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h File be/src/exec/parquet/parquet-delta-encoder.h: http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.h@54 PS19, Line 54: static constexpr int DEFAULT_MAX_TOTAL_VALUE_COUNT > Note that there is a query option for setting max row count in page: parque I introduced a new parameter to Init(), 'max_page_value_count', which can be used to set the value to 'parquet_page_row_count_limit' when we integrate the encoder with the Parquet writer. Left 16000 as a default. http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc File be/src/exec/parquet/parquet-delta-encoder.cc: http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@117 PS19, Line 117: const int data_len = output_buffer_pos_ - old_data_start_address; > Can you skip this if reserved_space_for_header_ == actual_header_size? Even Done http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/exec/parquet/parquet-delta-encoder.cc@289 PS19, Line 289: (delta_buffer_.size() + miniblock_size_in_values_ - 1) / miniblock_size_in_values_; > nit: indentation Done http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc File be/src/util/bit-packing.cc: http://gerrit.cloudera.org:8080/#/c/12621/19/be/src/util/bit-packing.cc@75 PS19, Line 75: : // Required for bit-packing-benchmark.cc. : template > Do we need these combinations? Done -- To view, visit http://gerrit.cloudera.org:8080/12621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8 Gerrit-Change-Number: 12621 Gerrit-PatchSet: 20 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 22 May 2023 17:24:13 +0000 Gerrit-HasComments: Yes
