Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16082 )
Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types) ...................................................................... Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88 PS2, Line 88: displayName_ = "DELETE TABLE " + displayName_; > Yeah, I just wanted to somehow mark this JOIN to make it obvious what's goi Sounds good. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454 PS2, Line 1454: if (fd.getRelativePath().startsWith("delete_delta_")) { > Unfortunately there's no such metadata AFAICT. It could be added during dat I think for this patch it is fine to keep it this way. I saw in the doc that there's a performance analysis phase ..perhaps it could be investigated/improved in that phase. Since this extra cost is added during planning time which typically is in hundreds of ms, It would be interesting to see for typical workloads what is the percentage change in planning time. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480 PS2, Line 1480: rawPath.add("row__id"); > It's needed and it's added by the above for loop at L1474-L1478. ah, i see..thanks. http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551 PS2, Line 1551: HdfsScanNode deltaScanNode = new HdfsScanNode(ctx_.getNextNodeId(), > Yeah, this makes sense. Actually, the planner does this optimization as part of SanNode.applyCountStarOptimization() which is triggered for Parquet file formats currently..so I suppose in the ORC case it doesn't even matter (although I could foresee doing such optimization for ORC as well). http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test: http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58 PS2, Line 58: | row-size=100B cardinality=143 > Maybe it's not that wrong. If I use compute stats then the actual cardinali Regarding the conjuncts, wouldn't it be incorrect to even pass the conjuncts to the delete delta scan ? we don't want to prune out deleted records too early. For the cardinality, do you mean if the original cardinality was 100 and 10 rows were deleted, does COMPUTE STATS show 90 ? Even if no compaction was done ? http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test: http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@286 PS4, Line 286: | | hash predicates: functional_orc_def.alltypes_deleted_rows.month = functional_orc_def.alltypes_deleted_rows-delete-delta.month, functional_orc_def.alltypes_deleted_rows.row__id.bucket = functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.bucket, functional_orc_def.alltypes_deleted_rows.row__id.originaltransaction = functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.originaltransaction, functional_orc_def.alltypes_deleted_rows.row__id.rowid = functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.rowid, functional_orc_def.alltypes_deleted_rows.year = functional_orc_def.alltypes_deleted_rows-delete-delta.year This looks like it is adding all the other Slots to hash predicates, apart from the hidden slots. Isn't that wrong ? My understanding was the anti join should only join on the ACID specific columns. -- To view, visit http://gerrit.cloudera.org:8080/16082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 Gerrit-Change-Number: 16082 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 19 Jun 2020 01:27:58 +0000 Gerrit-HasComments: Yes