Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19181 )
Change subject: IMPALA-11470: Add Cache For Codegen Functions ...................................................................... Patch Set 9: (9 comments) http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG@34 PS9, Line 34: As a limitation, we don't cache the udf functions, because it This paragraph is not clear to me. Does this mean that if any UDF is used somewhere in a fragment, then we disable codegen cache for that fragment? My understanding is that for most UDFs only the code that calls the UDF is codegened, and that should not change when reloading the UDF with the same signature, while changing the signature should lead to different codegened bytecode thus also different hash. The exceptions are IR UDFs - in that case the llvm bytecode is stored and codegen can proccess the functions themselves, not just the calling code. This means that that if the UDF is replaced, then the bitecode and its hash will also change. http://gerrit.cloudera.org:8080/#/c/19181/9//COMMIT_MSG@42 PS9, Line 42: Tpch performance test showed no significant : difference after adding this logic. Do the tpch tests use UDFs? http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc File be/src/codegen/llvm-codegen-cache.cc: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.cc@185 PS9, Line 185: // Because the Cache::Insert() requires the caller to avoid multiple handles for : // the same key to inser A similar restriction is mentioned for lookup: https://github.com/apache/impala/blob/c3ec9272c591624e8dfc130a85c98da318735f14/be/src/util/cache/cache.h#L159 Doesn't CodeGenCache::Lookup needs a similar logic to avoid looking up the same key in parallel? I am not familiar enough with our cache implementation, but needing this protection in each client looks weird - couldn't the Cache implement this in extra functions like InsertAllowParallelEqKeys() that similarly use a mutex and a key set? Also I am unsure if we actually need a complex locking scheme in this case - if my understanding is correct, codegen cache does not spill to disk so both Store() and Lookup() are expected to be quick. As cache lookups and stores are not too frequent, couldn't simply a lock protect both Store() and Lookup()? http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.h@933 PS9, Line 933: cdoe typo http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1131 PS9, Line 1131: LOG(INFO) << DebugCacheEntryString(cache_key, true /*is_lookup*/, : state_->query_options().codegen_cache_debug_mode, suc); This will run once per fragment per query on a node, right? Just a bit worried about making the log too verbose. http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1160 PS9, Line 1160: void* jitted_function = reinterpret_cast<void*>( : execution_engine_cached_->getFunctionAddress(function->getName())); Would it be possible to check the argument and return types of the function, or this is done elsewhere? http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/exprs/scalar-expr.cc@213 PS9, Line 213: (*expr)->SetAllowCodegenCache(texpr_node.fn.hdfs_location.empty()); Can you add a comment here about disabling codegen cache for UDFs? http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/service/query-options.h@279 PS9, Line 279: REGULAR Probably ADVANCED would be more fitting? http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/service/query-options.h@281 PS9, Line 281: REGULAR This should be DEVELOPMENT IMO -- To view, visit http://gerrit.cloudera.org:8080/19181 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1 Gerrit-Change-Number: 19181 Gerrit-PatchSet: 9 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Mon, 21 Nov 2022 17:32:07 +0000 Gerrit-HasComments: Yes
