Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16082 )
Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types) ...................................................................... Patch Set 2: (11 comments) Did a first pass..some comments below. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@673 PS2, Line 673: protected HdfsPartition(HdfsPartition other) { Looking at the changes in https://issues.apache.org/jira/browse/IMPALA-9778, the HdfsPartition class has been restructured quite a bit.. although it seems your changes should be ok since you are only creating a clone. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@178 PS2, Line 178: if (!isAcidJoin_ || detailLevel.ordinal() == TExplainLevel.VERBOSE.ordinal()) { For the anti join should we not show the join conditions in the Impala query profile ? (query profile output is in EXTENDED mode, not VERBOSE). http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@83 PS2, Line 83: // True if this join is used to do the join between insert and deleta delta files. nit: spelling 'deleta' http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88 PS2, Line 88: displayName_ = "ACID " + displayName_; The 'ACID' prefix for a HashJoin could be misleading to end users since a plan operator itself is not ACID in DBMS terms. Could we use something like. 'DELTA TABLE HASH JOIN' (open to other suggestions). http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453 PS2, Line 1453: FeFsPartition fsPartition = (FeFsPartition)partition; The casting is redundant since 'partition' is already of type FeFsPartition. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454 PS2, Line 1454: for (HdfsPartition.FileDescriptor fd : fsPartition.getFileDescriptors()) { Is there any indicator in the HMS's partition object that indicates whether any delete was done ? It does seem expensive to check all the file descriptors within a partition since in vast majority of cases nothing may have been deleted or the 'delete_delta' is found only towards the end. However, if such metadata does not exist, then I suppose this is the only alternative. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480 PS2, Line 1480: String[] acidFields = {"originaltransaction", "bucket", "rowid"}; The design doc suggests the anti-join is on 4 fields..is the partition id not needed in this join ? http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551 PS2, Line 1551: hdfsTblRef.getDesc(), conjuncts, insertPartitions, hdfsTblRef, aggInfo, The 'aggInfo' is meant for the simple 'select count(*) from table' type of query where the aggregation value can be found from the row count in the metadata. But in this case, since we want to compute this value only after the anti-join has been done, I think it should be null. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1591 PS2, Line 1591: ret.add(new BinaryPredicate( Call analyze for the BinaryPredicate. Also, could you add a simple test with explain_level=3 to check for the join condition ? I don't think I saw one. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1605 PS2, Line 1605: * @return the cloned partitoin nit: spelling 'partitoin' http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test: http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58 PS2, Line 58: | row-size=100B cardinality=143 The cardinality estimate for this hash join would need some adjustment in the future..right now it is showing the default behavior of a Left Outer Join, which is not accurate for the join between the delta and delete_deltas. -- To view, visit http://gerrit.cloudera.org:8080/16082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 Gerrit-Change-Number: 16082 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Thu, 18 Jun 2020 00:27:49 +0000 Gerrit-HasComments: Yes