Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
......................................................................


Patch Set 4:

(11 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@673
PS2, Line 673:     fileFormatDescriptor_ = other.fileFormatDescriptor_;
> Looking at the changes in https://issues.apache.org/jira/browse/IMPALA-9778
Resolved it during the rebase (PS3).


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@178
PS2, Line 178:       if (!isAcidJoin_ || detailLevel.ordinal() >= 
TExplainLevel.EXTENDED.ordinal()) {
> For the anti join should we not show the join conditions in the Impala quer
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@83
PS2, Line 83:   // True if this join is used to do the join between insert and 
delete delta files.
> nit: spelling 'deleta'
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88
PS2, Line 88:     displayName_ = "DELETE TABLE " + displayName_;
> The  'ACID' prefix for a HashJoin could be misleading to end users since a
Yeah, I just wanted to somehow mark this JOIN to make it obvious what's going 
on, but probably 'ACID' is not the best term to use.

I think 'DELETE TABLE HASH JOIN' makes sense.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453
PS2, Line 1453:       for (HdfsPartition.FileDescriptor fd : 
partition.getFileDescriptors()) {
> The casting  is redundant since 'partition' is already of type FeFsPartitio
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454
PS2, Line 1454:         if (fd.getRelativePath().startsWith("delete_delta_")) {
> Is there any indicator in the HMS's partition object that indicates whether
Unfortunately there's no such metadata AFAICT. It could be added during data 
loading, though it'd complicate the code a little, especially when local 
catalog is being used since it reuses loaded partitions while trying to filter 
out file descriptors based on an ACID write id list, see IMPALA-9791 for more 
details.

Also, I did some measurements. Created a table with 1000 partitions and 10 
files per each partition. Then I've nested this for-loop into another for-loop 
with 100 iterations to simulate a table with 100K partitions and 1 million 
files. This whole check was evaluated between 100-200 milliseconds.

So for now I'm leaning towards keeping it simple unless you feel strongly 
against it.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480
PS2, Line 1480:     rawPath.add("row__id");
> The design doc suggests the anti-join is on 4 fields..is the partition id n
It's needed and it's added by the above for loop at L1474-L1478.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551
PS2, Line 1551:     HdfsScanNode deltaScanNode = new 
HdfsScanNode(ctx_.getNextNodeId(),
> The  'aggInfo' is meant for the simple 'select count(*) from table'  type o
Yeah, this makes sense.
It didn't cause a bug because the backend only optimizes count(*) if it's a 
"zero-slot table scan", and we've added ACID slots to materialize.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1591
PS2, Line 1591:             insertSlotDesc.getMaterializedPath())) {
> Call analyze for the BinaryPredicate.  Also, could you add a simple test wi
Done. Added planner tests for explain_level 2 and 3.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1605
PS2, Line 1605:   /**
> nit: spelling 'partitoin'
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
> The cardinality estimate for this hash join would need some adjustment in t
Maybe it's not that wrong. If I use compute stats then the actual cardinalities 
are stored, i.e. the number of values that are not deleted.

So the left side of the JOIN will have the cardinality that is expected after 
the JOIN.

The cardinality of the delete table SCAN can be quite off currently because we 
don't pass the 'conjuncts' to it since there'se no data in the deleta delta 
files. So it's good that we don't take its cardinality into account. I think 
the only way to get reasonable cardinality number for the delete SCAN node is 
to do some sampling on the delete files.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 18 Jun 2020 16:40:09 +0000
Gerrit-HasComments: Yes

Reply via email to