Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom 
Iceberg Position Delete operator
......................................................................


Patch Set 10:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift@795
PS10, Line 795: of
nit: off


http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift@638
PS10, Line 638: 159: optional TJoinDistributionMode 
optimized_iceberg_v2_read_distribution_mode_override
This is good for perf testing for now, but we should limit the number of query 
options, so it would be good if we could remove this from the final PS.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@606
PS10, Line 606:   private PlanFragment 
createPartitionedIcebergDeleteFragment(IcebergDeleteNode node,
There's a lot of code shared between this and 
createPartitionedHashJoinFragment(). Can we refactor the common parts to new 
private methods?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@621
PS10, Line 621:     if (lhsHasCompatPartition && rhsHasCompatPartition
              :         && 
isCompatPartition(leftChildFragment.getDataPartition(),
              :                rightChildFragment.getDataPartition(), 
lhsJoinExprs, rhsJoinExprs,
              :                analyzer))
This should be always true for this operator, right? Either we can remove this 
and the unnecessary parameters, or turn it into a precondition.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@645
PS10, Line 645: lhsHasCompatPartition
Always true?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@659
PS10, Line 659: rhsHasCompatPartition
Always true?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@694
PS10, Line 694:     switch (node.getJoinOp()) {
Do we need this switch-case?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@725
PS10, Line 725:   private PlanFragment 
createIcebergDeleteFragment(IcebergDeleteNode node,
It shares a lot of code with createHashJoinFragment(). Can we refactor the 
common parts?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@767
PS10, Line 767: LOG.trace(
Why emitting this on a new log line?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@48
PS10, Line 48:  * Hash join between left child (outer) and right child (inner). 
One child must be the
             :  * plan generated for a table ref. Typically, that is the right 
child, but due to join
             :  * inversion (for outer/semi/cross joins) it could also be the 
left child.
Copy-pasted comment, irrelevant to this operator.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@83
PS10, Line 83:       if (!t0.matchesType(t1)) {
             :         // With decimal and char types, the child types do not 
have to match because
             :         // the equality builtin handles it. However, they will 
not hash correctly so
             :         // insert a cast.
             :         boolean bothDecimal = t0.isDecimal() && t1.isDecimal();
             :         boolean bothString = t0.isStringType() && 
t1.isStringType();
             :         if (!bothDecimal && !bothString) {
             :           throw new InternalException("Cannot compare " + 
t0.toSql() + " to " + t1.toSql()
             :               + " in join predicate.");
             :         }
             :         Type compatibleType =
             :             Type.getAssignmentCompatibleType(t0, t1, false, 
analyzer.isDecimalV2());
             :         if (compatibleType.isInvalid()) {
             :           throw new InternalException(
             :               String.format("Unable create a hash join with 
equi-join predicate %s "
             :                       + "because the operands cannot be cast 
without loss of precision. "
             :                       + "Operand types: %s = %s.",
             :                   eqPred.toSql(), t0.toSql(), t1.toSql()));
             :         }
             :         Preconditions.checkState(
             :             compatibleType.isDecimal() || 
compatibleType.isStringType());
             :         try {
             :           if (!t0.equals(compatibleType)) {
             :             eqPred.setChild(0, 
eqPred.getChild(0).castTo(compatibleType));
             :           }
             :           if (!t1.equals(compatibleType)) {
             :             eqPred.setChild(1, 
eqPred.getChild(1).castTo(compatibleType));
             :           }
             :         } catch (AnalysisException e) {
             :           throw new InternalException("Should not happen", e);
             :         }
             :       }
Unneeded. I think we just need a couple of preconditions that we got the 
expected predicates.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@151
PS10, Line 151:   /**
              :    * Helper to get the equi-join conjuncts in a thrift 
representation.
              :    */
              :   public List<TEqJoinCondition> getThriftEquiJoinConjuncts() {
              :     List<TEqJoinCondition> equiJoinConjuncts = new 
ArrayList<>(eqJoinConjuncts_.size());
              :     for (Expr entry : eqJoinConjuncts_) {
              :       BinaryPredicate bp = (BinaryPredicate) entry;
              :       TEqJoinCondition eqJoinCondition = new TEqJoinCondition(
              :           bp.getChild(0).treeToThrift(), 
bp.getChild(1).treeToThrift(),
              :           bp.getOp() == BinaryPredicate.Operator.NOT_DISTINCT);
              :
              :       equiJoinConjuncts.add(eqJoinCondition);
              :     }
              :     return equiJoinConjuncts;
              :   }
Could be moved to JoinNode?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@216
PS10, Line 216:       // The memory of the data stored in hash table and
              :       // the memory of the hash table‘s structure
              :       perBuildInstanceDataBytes = (long) Math.ceil(rhsCard * 
getChild(1).getAvgRowSize()
              :           + BitUtil.roundUpToPowerOf2((long) Math.ceil(3 * 
rhsCard / 2))
              :               * PlannerContext.SIZE_OF_BUCKET);
              :       if (rhsNdv > 1 && rhsNdv < rhsCard) {
              :         perBuildInstanceDataBytes +=
              :             (rhsCard - rhsNdv) * 
PlannerContext.SIZE_OF_DUPLICATENODE;
              :       }
Needs new calculation.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@235
PS10, Line 235: joinOp_ == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN
Always false


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@264
PS10, Line 264:     if (joinOp_ == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN) {
              :       // Only one of the NAAJ probe streams is written at a 
time, so it needs a max-sized
              :       // buffer. If the build is integrated, we already have a 
max sized buffer accounted
              :       // for in the build reservation.
              :       perInstanceProbeMinMemReservation =
              :           hasSeparateBuild() ? maxRowBufferSize + bufferSize : 
bufferSize * 2;
              :     }
Dead code



--
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 17 May 2023 12:35:40 +0000
Gerrit-HasComments: Yes

Reply via email to