Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12636
Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string ...................................................................... IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string We had a too rigorous DCHECK in the code of ColumnStats<StringValue>::Merge(). The DCHECK makes sure that we copy the StringValues into their own buffer from the RowBatch memory. Otherwise their value can be overwritten by following row batches. The internal pointer of empty StringValues are NULL, so there is no need to copy them to another buffer, therefore the DCHECKs are unnecessary and moreover, they can result in crashes. Now we only evaluate the DCHECKs when the corresponding StringValues are not empty strings. Testing: I added an e2e test that inserts a lot of empty strings into a table. Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289 --- M be/src/exec/parquet/parquet-column-stats.inline.h M testdata/workloads/tpch/queries/insert_parquet.test 2 files changed, 19 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/12636/1 -- To view, visit http://gerrit.cloudera.org:8080/12636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289 Gerrit-Change-Number: 12636 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
