Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
......................................................................


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG@34
PS9, Line 34: As a limitation, we don't cache the udf functions, because it
This paragraph is not clear to me. Does this mean that if any UDF is used 
somewhere in a fragment, then we disable codegen cache for that fragment?

My understanding is that for most UDFs only the code that calls the UDF is 
codegened, and that should not change when reloading the UDF with the same 
signature, while changing the signature should lead to different codegened 
bytecode thus also different hash.

The exceptions are IR UDFs - in that case the llvm bytecode is stored and 
codegen can proccess the functions themselves, not just the calling code. This 
means that  that if the UDF is replaced, then the bitecode and its hash will 
also change.


http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG@42
PS9, Line 42: Tpch performance test showed no significant
            : difference after adding this logic.
Do the tpch tests use UDFs?


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc@185
PS9, Line 185:     // Because the Cache::Insert() requires the caller to avoid 
multiple handles for
             :     // the same key to inser
A similar restriction is mentioned for lookup:
https://github.com/apache/impala/blob/c3ec9272c591624e8dfc130a85c98da318735f14/be/src/util/cache/cache.h#L159

Doesn't CodeGenCache::Lookup needs a similar logic to avoid looking up the same 
key in parallel?

I am not familiar enough with our cache implementation, but needing this 
protection in each client looks weird - couldn't the Cache implement this in 
extra functions like InsertAllowParallelEqKeys() that similarly use a mutex and 
a key set?

Also I am unsure if we actually need a complex locking scheme in this case - if 
my understanding is correct, codegen cache does not spill to disk so both 
Store() and Lookup() are expected to be quick. As cache lookups and stores are 
not too frequent, couldn't simply a lock protect both Store() and Lookup()?


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.h@933
PS9, Line 933: cdoe
typo


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1131
PS9, Line 1131:   LOG(INFO) << DebugCacheEntryString(cache_key, true 
/*is_lookup*/,
              :       state_->query_options().codegen_cache_debug_mode, suc);
This will run once per fragment per query on a node, right? Just a bit worried 
about making the log too verbose.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1160
PS9, Line 1160:       void* jitted_function = reinterpret_cast<void*>(
              :           
execution_engine_cached_->getFunctionAddress(function->getName()));
Would it be possible to check the argument and return types of the function, or 
this is done elsewhere?


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/exprs/scalar-expr.cc@213
PS9, Line 213:   
(*expr)->SetAllowCodegenCache(texpr_node.fn.hdfs_location.empty());
Can you add a comment here about disabling codegen cache for UDFs?


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/service/query-options.h@279
PS9, Line 279: REGULAR
Probably ADVANCED would be more fitting?


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/service/query-options.h@281
PS9, Line 281: REGULAR
This should be DEVELOPMENT IMO



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 9
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Mon, 21 Nov 2022 17:32:07 +0000
Gerrit-HasComments: Yes

Reply via email to