Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16082 )
Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types) ...................................................................... Patch Set 4: (11 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@673 PS2, Line 673: fileFormatDescriptor_ = other.fileFormatDescriptor_; > Looking at the changes in https://issues.apache.org/jira/browse/IMPALA-9778 Resolved it during the rebase (PS3). http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@178 PS2, Line 178: if (!isAcidJoin_ || detailLevel.ordinal() >= TExplainLevel.EXTENDED.ordinal()) { > For the anti join should we not show the join conditions in the Impala quer Done http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@83 PS2, Line 83: // True if this join is used to do the join between insert and delete delta files. > nit: spelling 'deleta' Done http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88 PS2, Line 88: displayName_ = "DELETE TABLE " + displayName_; > The 'ACID' prefix for a HashJoin could be misleading to end users since a Yeah, I just wanted to somehow mark this JOIN to make it obvious what's going on, but probably 'ACID' is not the best term to use. I think 'DELETE TABLE HASH JOIN' makes sense. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453 PS2, Line 1453: for (HdfsPartition.FileDescriptor fd : partition.getFileDescriptors()) { > The casting is redundant since 'partition' is already of type FeFsPartitio Done http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454 PS2, Line 1454: if (fd.getRelativePath().startsWith("delete_delta_")) { > Is there any indicator in the HMS's partition object that indicates whether Unfortunately there's no such metadata AFAICT. It could be added during data loading, though it'd complicate the code a little, especially when local catalog is being used since it reuses loaded partitions while trying to filter out file descriptors based on an ACID write id list, see IMPALA-9791 for more details. Also, I did some measurements. Created a table with 1000 partitions and 10 files per each partition. Then I've nested this for-loop into another for-loop with 100 iterations to simulate a table with 100K partitions and 1 million files. This whole check was evaluated between 100-200 milliseconds. So for now I'm leaning towards keeping it simple unless you feel strongly against it. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480 PS2, Line 1480: rawPath.add("row__id"); > The design doc suggests the anti-join is on 4 fields..is the partition id n It's needed and it's added by the above for loop at L1474-L1478. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551 PS2, Line 1551: HdfsScanNode deltaScanNode = new HdfsScanNode(ctx_.getNextNodeId(), > The 'aggInfo' is meant for the simple 'select count(*) from table' type o Yeah, this makes sense. It didn't cause a bug because the backend only optimizes count(*) if it's a "zero-slot table scan", and we've added ACID slots to materialize. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1591 PS2, Line 1591: insertSlotDesc.getMaterializedPath())) { > Call analyze for the BinaryPredicate. Also, could you add a simple test wi Done. Added planner tests for explain_level 2 and 3. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1605 PS2, Line 1605: /** > nit: spelling 'partitoin' Done http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test: http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58 PS2, Line 58: | row-size=100B cardinality=143 > The cardinality estimate for this hash join would need some adjustment in t Maybe it's not that wrong. If I use compute stats then the actual cardinalities are stored, i.e. the number of values that are not deleted. So the left side of the JOIN will have the cardinality that is expected after the JOIN. The cardinality of the delete table SCAN can be quite off currently because we don't pass the 'conjuncts' to it since there'se no data in the deleta delta files. So it's good that we don't take its cardinality into account. I think the only way to get reasonable cardinality number for the delete SCAN node is to do some sampling on the delete files. -- To view, visit http://gerrit.cloudera.org:8080/16082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 Gerrit-Change-Number: 16082 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 18 Jun 2020 16:40:09 +0000 Gerrit-HasComments: Yes
