Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20460


Change subject: IMPALA-12371: Add better cardinality estimation for Iceberg V2 
tables with deletes
......................................................................

IMPALA-12371: Add better cardinality estimation for Iceberg V2 tables with 
deletes

Currently IcebergDeleteNode's cardinality is the same as the LHS's
cardinality, i.e. we don't take the RHS into account. The RHS contains
the position delete records, so it is a fair assumption that all records
at RHS remove a record from RHS (duplicated delete records should be
extremely rare).

If there are conjuncts on the Iceberg table we can assume that they have
the same selectivity on the data records and on the delete records.

With the above assumptions this change updates the cardinality of the
IcebergDeleteNode with the basically the following formula:

 Card(IcebergDeleteNode) = Card(LHS) - Selectivity(LHS) * Card(RHS);

To deal with edge cases when there are lots of duplicated delete
records, we actually use a slightly more complex formula:

 Card(IcebergDeleteNode) =
   Max(
     Min(1, Card(LHS))),
     Card(LHS) - Selectivity(LHS) * Card(RHS)
   );

Testing:
 * updated the planner tests

Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
---
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
3 files changed, 94 insertions(+), 79 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20460/1
--
To view, visit http://gerrit.cloudera.org:8080/20460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I988dc8d7e1074932c460b3702d3381341e5b23c5
Gerrit-Change-Number: 20460
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>

Reply via email to