Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21398 )

Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple 
cache key
......................................................................


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@868
PS4, Line 868:     return treeToThrift(new ThriftSerializationCtx());
> Why pass context vs null? Does this have any implications WRT being able to
We are using the context for a few things:
 - Registering tuples
 - Translating tuple/slot IDs
 - (Future) Setting state to say that this is ineligible for caching

One way of expressing that is to have null vs not-null and then conditional 
logic in toThrift() functions:
if (tupleCacheInfo != null) {
  tupleCacheInfo.markIneligible(BAD_WHATEVER);
}
or
TWhatever output = TWhatever(tupleCacheInfo == null ? id_.toInt() : 
tupleCacheInfo.translateTupleId(id_).toInt());
or
if (tupleCacheInfo == null) {
  // set fields on the resulting struct
}

The other way is to have a layer that we call unconditionally and under the 
covers it does the right thing. i.e.
serialCtx.markIneligibleForTupleCaching(BAD_WHATEVER);
or
TWhatever output = TWhatever(serialCtx.translateTupleId(id_).toInt());
or
if (!serialCtx.isTupleCaching()) {
  // set fields on the resulting struct
}
I'm currently doing the second approach. Neither approach is particularly 
elegant.


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@946
PS4, Line 946:       result.add(expr.treeToThrift());
> Why doesn't this pass the serialCtx?
Ooops, that is a mistake. Fixed this to pass in serialCtx.


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java
File fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java@72
PS4, Line 72:   public void registerTuple(TupleId id) {
> Would it be cleaner to have this logic in Analyzer instead?
There's a limit to how early we can go when doing id translation. Runtime 
filters add edges to the PlanNode graph and impact the assignment of local 
tuple ids. (The logic to reflect this is coming.) Runtime filter generation 
happens during distributed planning, and it is not much earlier than 
TupleCachePlanner.

I haven't found a simpler solution yet.


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1945
PS4, Line 1945:             
serialCtx.translateTupleId(entry.getKey().getId()).asInt(),
> Consider moving this inside Expr.treesToThrift() and passing in entry.
We can't push this down into Expr.treesToThrift(). This is inserting into a 
Integer->List<TExpr> map, so it has a key and a value. Expr.treesToThrift() 
handles the value, but the key has to be handled here.


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@510
PS4, Line 510:     msg.node_id = serialCtx.isTupleCache()? 0 : id_.asInt();
> Move these assignments to the conditional branch below
Done


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@528
PS4, Line 528:       
msg.addToRow_tuples(serialCtx.translateTupleId(tid).asInt());
> Could local tids be employed early so that translation isn't needed here?
I haven't seen a way to do this much earlier.


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java
File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@59
PS4, Line 59:  * One critical piece of making the Thrift structures more 
general is id translation.
            :  * Since TupleIds and SlotIds are global to the query, the value 
of any id will be
            :  * influenced by the rest of the query unless we translate it to 
a local id.
            :  * Plan nodes register their tuples via registerTuple(). This 
allows the tuple / slot
            :  * information to be incorporated into the hash (by accessing the 
DescriptorTable),
            :  * but it also allocates a local id and adds an entry to the 
translation map. Exprs
            :  * and other structures can use translateSlotId() and 
translateTupleId() to adjust
            :  * global ids to local ids. When TupleCacheInfos are merged, they 
merge the translations
            :  * so there are no conflicts.
Added some more content here: Why not calculate it earlier? Does anything else 
use local ids? Key side of the map: make it clear that this is a global to 
local transformation and always in that direction.


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@263
PS4, Line 263:   public TupleId translateTupleId(TupleId id) {
             :     // The tuple must have been registered before this reference 
happens
             :     
Preconditions.checkState(tupleTranslationMap_.containsKey(id));
             :     return tupleTranslationMap_.get(id);
             :   }
             :
             :   public SlotId translateSlotId(SlotId id) {
             :     // The slot must have been registered before this reference 
happens
             :     
Preconditions.checkState(slotTranslationMap_.containsKey(id));
             :     return slotTranslationMap_.get(id);
Renamed these to emphasize that this is global to local translation



--
To view, visit http://gerrit.cloudera.org:8080/21398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c
Gerrit-Change-Number: 21398
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 23 May 2024 00:45:12 +0000
Gerrit-HasComments: Yes

Reply via email to