Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20295
Change subject: IMPALA-12327: Iceberg V2 operator wrong results in PARTITIONED mode ...................................................................... IMPALA-12327: Iceberg V2 operator wrong results in PARTITIONED mode The Iceberg delete node tries to do mini merge-joins between data records and delete records. This works in DISTRIBUTED mode, and most of the time in PARTITIONED mode as well. Though the Iceberg delete node had the wrong assumption that if the rows in a row batch belong to the same file, and come in ascending order, we don't need to update the IcebergDeleteState which tracks the state of the merge join. When PARTITIONED mode is used, we cannot rely on ascending row order, not even inside row batches, not even when the previous file path is the same as the current one. This is because files with multiple blocks can be processed by multiple hosts in parallel, then the rows are getting hash-exchanged based on their file paths. Then the exchange-receiver at the LHS coalesces the row batches from multiple senders, hence the row IDs getting unordered. This patch adds a fix to quickly reset the state of the merge join when the position-based difference between the current row and previous row is not one, and we are in PARTITIONED mode. Tests: * added e2e tests Change-Id: Ib89a53e812af8c3b8ec5bc27bca0a50dcac5d924 --- M be/src/exec/iceberg-delete-node.cc M testdata/bin/create-load-data.sh M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 4 files changed, 64 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/20295/1 -- To view, visit http://gerrit.cloudera.org:8080/20295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib89a53e812af8c3b8ec5bc27bca0a50dcac5d924 Gerrit-Change-Number: 20295 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
