Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21398 )
Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple cache key ...................................................................... Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@868 PS4, Line 868: return treeToThrift(new ThriftSerializationCtx()); > Why pass context vs null? Does this have any implications WRT being able to We are using the context for a few things: - Registering tuples - Translating tuple/slot IDs - (Future) Setting state to say that this is ineligible for caching One way of expressing that is to have null vs not-null and then conditional logic in toThrift() functions: if (tupleCacheInfo != null) { tupleCacheInfo.markIneligible(BAD_WHATEVER); } or TWhatever output = TWhatever(tupleCacheInfo == null ? id_.toInt() : tupleCacheInfo.translateTupleId(id_).toInt()); or if (tupleCacheInfo == null) { // set fields on the resulting struct } The other way is to have a layer that we call unconditionally and under the covers it does the right thing. i.e. serialCtx.markIneligibleForTupleCaching(BAD_WHATEVER); or TWhatever output = TWhatever(serialCtx.translateTupleId(id_).toInt()); or if (!serialCtx.isTupleCaching()) { // set fields on the resulting struct } I'm currently doing the second approach. Neither approach is particularly elegant. http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@946 PS4, Line 946: result.add(expr.treeToThrift()); > Why doesn't this pass the serialCtx? Ooops, that is a mistake. Fixed this to pass in serialCtx. http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java File fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java@72 PS4, Line 72: public void registerTuple(TupleId id) { > Would it be cleaner to have this logic in Analyzer instead? There's a limit to how early we can go when doing id translation. Runtime filters add edges to the PlanNode graph and impact the assignment of local tuple ids. (The logic to reflect this is coming.) Runtime filter generation happens during distributed planning, and it is not much earlier than TupleCachePlanner. I haven't found a simpler solution yet. http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1945 PS4, Line 1945: serialCtx.translateTupleId(entry.getKey().getId()).asInt(), > Consider moving this inside Expr.treesToThrift() and passing in entry. We can't push this down into Expr.treesToThrift(). This is inserting into a Integer->List<TExpr> map, so it has a key and a value. Expr.treesToThrift() handles the value, but the key has to be handled here. http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@510 PS4, Line 510: msg.node_id = serialCtx.isTupleCache()? 0 : id_.asInt(); > Move these assignments to the conditional branch below Done http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@528 PS4, Line 528: msg.addToRow_tuples(serialCtx.translateTupleId(tid).asInt()); > Could local tids be employed early so that translation isn't needed here? I haven't seen a way to do this much earlier. http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@59 PS4, Line 59: * One critical piece of making the Thrift structures more general is id translation. : * Since TupleIds and SlotIds are global to the query, the value of any id will be : * influenced by the rest of the query unless we translate it to a local id. : * Plan nodes register their tuples via registerTuple(). This allows the tuple / slot : * information to be incorporated into the hash (by accessing the DescriptorTable), : * but it also allocates a local id and adds an entry to the translation map. Exprs : * and other structures can use translateSlotId() and translateTupleId() to adjust : * global ids to local ids. When TupleCacheInfos are merged, they merge the translations : * so there are no conflicts. Added some more content here: Why not calculate it earlier? Does anything else use local ids? Key side of the map: make it clear that this is a global to local transformation and always in that direction. http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@263 PS4, Line 263: public TupleId translateTupleId(TupleId id) { : // The tuple must have been registered before this reference happens : Preconditions.checkState(tupleTranslationMap_.containsKey(id)); : return tupleTranslationMap_.get(id); : } : : public SlotId translateSlotId(SlotId id) { : // The slot must have been registered before this reference happens : Preconditions.checkState(slotTranslationMap_.containsKey(id)); : return slotTranslationMap_.get(id); Renamed these to emphasize that this is global to local translation -- To view, visit http://gerrit.cloudera.org:8080/21398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c Gerrit-Change-Number: 21398 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 23 May 2024 00:45:12 +0000 Gerrit-HasComments: Yes
