[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

Aman Sinha (Code Review) Thu, 18 Jun 2020 18:28:22 -0700

Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )


Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88
PS2, Line 88:     displayName_ = "DELETE TABLE " + displayName_;
> Yeah, I just wanted to somehow mark this JOIN to make it obvious what's goi
Sounds good.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454
PS2, Line 1454:         if (fd.getRelativePath().startsWith("delete_delta_")) {
> Unfortunately there's no such metadata AFAICT. It could be added during dat
I think for this patch it is fine to keep it this way.  I saw in the doc that 
there's a performance analysis phase ..perhaps it could be 
investigated/improved in that phase.  Since this extra cost is added during 
planning time which typically is in hundreds of ms, It would be interesting to 
see for typical workloads what is the percentage change in planning time.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480
PS2, Line 1480:     rawPath.add("row__id");
> It's needed and it's added by the above for loop at L1474-L1478.
ah, i see..thanks.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551
PS2, Line 1551:     HdfsScanNode deltaScanNode = new 
HdfsScanNode(ctx_.getNextNodeId(),
> Yeah, this makes sense.
Actually, the planner does this optimization as part of 
SanNode.applyCountStarOptimization() which is triggered for Parquet file 
formats currently..so I suppose in the ORC case it doesn't even matter 
(although I could foresee doing such optimization for ORC as well).


http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
> Maybe it's not that wrong. If I use compute stats then the actual cardinali
Regarding the conjuncts, wouldn't it be incorrect to  even pass the conjuncts 
to the delete delta scan ?  we don't want to prune out deleted records too 
early.
For the cardinality, do you mean if the original cardinality was 100 and 10 
rows were deleted,  does COMPUTE STATS show  90 ?  Even if no compaction was 
done ?


http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@286
PS4, Line 286: |  |  hash predicates: 
functional_orc_def.alltypes_deleted_rows.month = 
functional_orc_def.alltypes_deleted_rows-delete-delta.month, 
functional_orc_def.alltypes_deleted_rows.row__id.bucket = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.bucket, 
functional_orc_def.alltypes_deleted_rows.row__id.originaltransaction = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.originaltransaction,
 functional_orc_def.alltypes_deleted_rows.row__id.rowid = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.rowid, 
functional_orc_def.alltypes_deleted_rows.year = 
functional_orc_def.alltypes_deleted_rows-delete-delta.year
This looks like it is adding all the other Slots to hash predicates, apart from 
the hidden slots.  Isn't that wrong ? My understanding was the anti join should 
only join on the ACID specific columns.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 19 Jun 2020 01:27:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

Reply via email to