Noemi Pap-Takacs has uploaded this change for review. (
http://gerrit.cloudera.org:8080/22407
Change subject: IMPALA-12588: Don't UPDATE rows that already have the desired
value
......................................................................
IMPALA-12588: Don't UPDATE rows that already have the desired value
When UPDATEing an Iceberg or Kudu table, we should change as few rows
as possible. In case of Iceberg tables it means writing as few new
data records and delete records as possible.
Therefore, if rows already have the new values we should just ignore them.
One way to achieve this is to add extra predicates, e.g.:
UPDATE tbl SET k = 3 WHERE i > 4;
==>
UPDATE tbl SET k = 3 WHERE i > 4 AND k IS DISTINTC FROM 3;
So we won't write new data/delete records for the rows that already have
the desired value.
Explanation on how to create extra predicates to filter out these rows:
If there are multiple assignments in the SET list, we can only skip
updating a row if all the mentioned values are already equal.
If either of the values needs to be updated, the entire row does.
Therefore we can think of the SET list as predicates connected with AND.
To negate this SET list, we have to negate the individual SET
assignments and connect them with OR.
Then add this new compound predicate to the original where predicates
with an AND (if there were none, just create a where predicate from it).
AND
/ \
original OR
WHERE predicate / \
!a OR
/ \
!b !c
This simple graph illustrates how the where predicate is rewritten.
(Considering an UPDATE statement that sets 3 columns.)
'!a', '!b' and '!c' are the negations of the individual assignments in
the SET list. So the extended WHERE predicate is:
(original WHERE predicate) AND (!a OR !b OR !c)
To handle NULL values correctly, we use IS DISTINCT FROM instead of
simply negating the assignment with operator '!='.
For some cases it can be trickier (e.g. UPDATE FROM), those cases
could be handled more easily by the MERGE statement.
Testing:
- Analysis
- Planner
- E2E
- Kudu
- Iceberg
Change-Id: I926c80e8110de5a4615a3624a81a330f54317c8b
---
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M
testdata/workloads/functional-planner/queries/PlannerTest/kudu-dml-with-utc-conversion.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M tests/query_test/test_iceberg.py
8 files changed, 274 insertions(+), 48 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/22407/2
--
To view, visit http://gerrit.cloudera.org:8080/22407
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I926c80e8110de5a4615a3624a81a330f54317c8b
Gerrit-Change-Number: 22407
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>