Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21035 )
Change subject: IMPALA-12818: Intermediate Result Caching plan node framework ...................................................................... Patch Set 3: (12 comments) http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG@24 PS3, Line 24: eligible location. Eligibility is currently restricted to immediately > It's a little awkward that placement is restricted in this commit since tha I added additional tests for merging TupleCacheInfo cases in TupleCacheInfoTest. http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@28 PS3, Line 28: allow_tuple_caching > Agreed. In this case, I think including a DCHECK(FLAGS_allow_tuple_caching) I added a check that the enable_tuple_caching query option is true in Open(). It returns a not-OK status if enable_tuple_caching is false. The allow_tuple_caching flag is mainly for the coordinator in this patch, but I'm open to adding more validation when we add the tuple caching implementation. http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@69 PS3, Line 69: // Note: TupleCacheNode does not support limits, so this does not need logic to > Limits can be supported for deterministic results. Maybe change to a TODO. Changed this comment to make it clearer. I was trying to say that TupleCacheNode doesn't enforce its own limit as it is returning output. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@499 PS3, Line 499: private void initThrift(TPlanNode msg, TupleCacheInfo tupleCacheInfo) { > Maybe cleaner to split this function so the subset can be called for IRC. There will be multiple patches that revisit this function as we do more masking / translation of tuple ids, etc. I would like to punt on this until those patches start landing. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@530 PS3, Line 530: tupleCacheInfo == null > Could you please add a comment to describe when or why the tupleCacheInfo i Refactored this to be a ThriftSerializationCtx that is passed in. This code now checks whether isTupleCache(). http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1237 PS3, Line 1237: recurse to children > For clarity, maybe it is better to have a full sentence like, "Recursively Reworked this comment to describe how this is a bottom-up tree traversal. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@59 PS3, Line 59: String > May consider StringBuilder for hashTrace_ due to string concatenations in m Changed this to use StringBuilder. I also added an explicit finalize() call to produce the final hash / hash trace values. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@112 PS3, Line 112: Preconditions.checkState(false); > Could thrift throw an exception due to OutOfMemory? Thrift has several exceptions it can throw. I will incorporate information about the exception into the error message for this assertion. If I don't convert this to an assertion, then I need to label this function "throws ImpalaException". Any function that calls it needs to be labeled "throws ImpalaException". I'm open to the idea that we may want to handle the exception. If we hit this exception, we can't really do tuple caching, but I guess we could keep going without tuple caching. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java File fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java@56 PS3, Line 56: PlanNode newRoot = buildCachingPlan(root); > Can we omit newRoot? Or verify that root == newRoot? newRoot may actually be different from root, but buildCachingPlan has modified the PlanFragment appropriately. I'm adding an assert that the newRoot matches the top-most PlanFragment's root. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/TupleCachePlanner.java@94 PS3, Line 94: // If not eligible, just leave things unchanged. > Could check !isEligible first and avoid a level of nesting. Good point, done http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@150 PS3, Line 150: assertEquals(intersection.size(), 0); > Consider verifying the hash trace here also. Changed this to verify the hash trace in the same way. I also changed the code to print information about the cache keys and hash trace on failure. http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test File testdata/workloads/functional-query/queries/QueryTest/explain-level1.test: http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test@63 PS3, Line 63: row_regex:.*\[TPlanNode\(.*\] > Maybe better to verify the actual cache key and at least the first and last This particular test file is checking what the explain output contains at different explain levels. We will have other tests for detecting fundamental changes in the keys, but this one is narrowly targeted. -- To view, visit http://gerrit.cloudera.org:8080/21035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Gerrit-Change-Number: 21035 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 07 Mar 2024 00:18:05 +0000 Gerrit-HasComments: Yes
