Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21035 )

Change subject: IMPALA-12818: Intermediate Result Caching plan node framework
......................................................................


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG@24
PS3, Line 24: eligible location. Eligibility is currently restricted to 
immediately
> It's a little awkward that placement is restricted in this commit since tha
I added additional tests for merging TupleCacheInfo cases in TupleCacheInfoTest.


http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@28
PS3, Line 28: allow_tuple_caching
> Agreed. In this case, I think including a DCHECK(FLAGS_allow_tuple_caching)
I added a check that the enable_tuple_caching query option is true in Open(). 
It returns a not-OK status if enable_tuple_caching is false.

The allow_tuple_caching flag is mainly for the coordinator in this patch, but 
I'm open to adding more validation when we add the tuple caching implementation.


http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@69
PS3, Line 69:   // Note: TupleCacheNode does not support limits, so this does 
not need logic to
> Limits can be supported for deterministic results. Maybe change to a TODO.
Changed this comment to make it clearer. I was trying to say that 
TupleCacheNode doesn't enforce its own limit as it is returning output.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@499
PS3, Line 499:   private void initThrift(TPlanNode msg, TupleCacheInfo 
tupleCacheInfo) {
> Maybe cleaner to split this function so the subset can be called for IRC.
There will be multiple patches that revisit this function as we do more masking 
/ translation of tuple ids, etc. I would like to punt on this until those 
patches start landing.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@530
PS3, Line 530: tupleCacheInfo == null
> Could you please add a comment to describe when or why the tupleCacheInfo i
Refactored this to be a ThriftSerializationCtx that is passed in. This code now 
checks whether isTupleCache().


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1237
PS3, Line 1237: recurse to children
> For clarity, maybe it is better to have a full sentence like, "Recursively
Reworked this comment to describe how this is a bottom-up tree traversal.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java
File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@59
PS3, Line 59: String
> May consider StringBuilder for hashTrace_ due to string concatenations in m
Changed this to use StringBuilder. I also added an explicit finalize() call to 
produce the final hash / hash trace values.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@112
PS3, Line 112:       Preconditions.checkState(false);
> Could thrift throw an exception due to OutOfMemory?
Thrift has several exceptions it can throw. I will incorporate information 
about the exception into the error message for this assertion.

If I don't convert this to an assertion, then I need to label this function 
"throws ImpalaException". Any function that calls it needs to be labeled 
"throws ImpalaException".

I'm open to the idea that we may want to handle the exception. If we hit this 
exception, we can't really do tuple caching, but I guess we could keep going 
without tuple caching.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java
File fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java@56
PS3, Line 56:     PlanNode newRoot = buildCachingPlan(root);
> Can we omit newRoot? Or verify that root == newRoot?
newRoot may actually be different from root, but buildCachingPlan has modified 
the PlanFragment appropriately. I'm adding an assert that the newRoot matches 
the top-most PlanFragment's root.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java@94
PS3, Line 94:       // If not eligible, just leave things unchanged.
> Could check !isEligible first and avoid a level of nesting.
Good point, done


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java
File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@150
PS3, Line 150:     assertEquals(intersection.size(), 0);
> Consider verifying the hash trace here also.
Changed this to verify the hash trace in the same way. I also changed the code 
to print information about the cache keys and hash trace on failure.


http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
File testdata/workloads/functional-query/queries/QueryTest/explain-level1.test:

http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test@63
PS3, Line 63: row_regex:.*\[TPlanNode\(.*\]
> Maybe better to verify the actual cache key and at least the first and last
This particular test file is checking what the explain output contains at 
different explain levels. We will have other tests for detecting fundamental 
changes in the keys, but this one is narrowly targeted.



--
To view, visit http://gerrit.cloudera.org:8080/21035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961
Gerrit-Change-Number: 21035
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 07 Mar 2024 00:18:05 +0000
Gerrit-HasComments: Yes

Reply via email to