Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19898 )
Change subject: IMPALA-10186: Fix writing empty parquet page ...................................................................... IMPALA-10186: Fix writing empty parquet page Fixes writing an empty parquet page when a page fills (or reaches parquet_page_row_count_limit) at the same time that its dictionary fills. When a page filled (or reached parquet_page_row_count_limit) at the same time that the dictionary filled, Impala would first detect the page was full and create a new page. It would then detect the dictionary is full and create another page, resulting in an empty page. Parquet readers like Hive error if they encounter an empty page. This patch attempts to make it impossible to generate an empty page by reworking AppendRow and adding DCHECKs for empty pages. Dictionary size is checked on FinalizeCurrentPage so whenever a page is written, we also flush the dictionary if full. Addresses clang-tidy by adding override in source files. Testing: - new test for full page size reached with full dictionary - new test for parquet_page_row_count_limit with full dictionary - new test for parquet_page_row_count_limit followed by large value. This seems useful as a theoretical corner-case; it currently writes the too-large value to the page anyway, but if we ever start checking whether the first value will fit the page this could become an issue. Change-Id: I90d30d958f07c6289a1beba1b5df1ab3d7213799 Reviewed-on: http://gerrit.cloudera.org:8080/19898 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/util/dict-encoding.h M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/empty_parquet_page_source_impala10186/data.csv M tests/query_test/test_parquet_page_index.py 6 files changed, 102,267 insertions(+), 70 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/19898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I90d30d958f07c6289a1beba1b5df1ab3d7213799 Gerrit-Change-Number: 19898 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
