Hello Thomas Marshall, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/10483
to look at the new patch set (#5).
Change subject: IMPALA-7044: Prevent overflow when computing Parquet block size
......................................................................
IMPALA-7044: Prevent overflow when computing Parquet block size
When writing Parquet files we compute a minimum block size based on the
number of columns in the target table:
3 * page_size * num_cols
For tables with a large number of columns (> ~10k), this value will get
larger than 2GB. When we pass it to hdfsOpenFile() in
HdfsTableSink::CreateNewTmpFile() it gets cast to a signed int32 and can
overflow.
To fix this we return an error if we detect that the minimum block size
exceed 2GB.
This change adds a test using CTAS into a table with 12k columns, making
sure that Impala returns the correct error.
Change-Id: I6e63420e5a093c0bbc789201771708865b16e138
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M tests/query_test/test_insert_parquet.py
4 files changed, 41 insertions(+), 12 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/10483/5
--
To view, visit http://gerrit.cloudera.org:8080/10483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e63420e5a093c0bbc789201771708865b16e138
Gerrit-Change-Number: 10483
Gerrit-PatchSet: 5
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>