Michael Ho has posted comments on this change. Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen() ......................................................................
Patch Set 1: (16 comments) http://gerrit.cloudera.org:8080/#/c/4651/1//COMMIT_MSG Commit Message: PS1, Line 7: IMPALA-3686 > wrong JIRA number: IMPALA-3638 I am not 100% sure the extent which IMPALA-3638 is trying to cover so I am fine if you think a new bug is better. Either way, this patch alleviates the issue mentioned in IMPALA-3638 as we will now codegen for ScalarFnCall even if there is no codegen enabled operator in the fragment. The caveat you mentioned is due to the fact that we don't have a way to invoke the JIT compiled function in the interpretation path of Expr except for ScalarFnCall. Once we fix that, then codegen'ing all the Exprs makes sense. That may also explain the oddness of ScalarFnCall::Prepare() (i.e. IMPALA-4233) http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 168: > Unnecessary blank line. Done Line 174: if (state->codegen_enabled()) { > It would be great if we could avoid duplicating this check in every Codegen Don't want to introduce another function per exec node just for that. I updated the patch to add the string to runtime profile in Prepare() if codegen is disabled. That may fit in well with what we want down the road if each fragment instance populates its pointers to compiled functions at Prepare(). Line 833: LlvmCodeGen* codegen = state->codegen(); > Maybe we should just pass the codegen object in as an argument? Not so sure yet as we may need to access the Expr stored in RuntimeState (eventually) so I prefer to keep it as is for now as the code is still in flux. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 295: runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str()); > Extra blank line. Done http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc File be/src/exec/old-hash-table.cc: Line 419: LlvmCodeGen* codegen = state->codegen(); > IMO we should be passing in the codegen object rather than the state in mos Not so sure about this yet as we may need RuntimeState for accessing Expr in the future. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/literal.cc File be/src/exprs/literal.cc: Line 378: Status Literal::GetCodegendComputeFn(RuntimeState* state, llvm::Function** fn) { > I feel like this should take codegen as an argument instead of state. Yes, this should take codegen as argument. I have converted all GetCodendComputeFn(). http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS1, Line 98: UNLIKELY > UNLIKELY seems unnecessary since this isn't a hot path. It's part of the Prepare() path which may have impact to the latency of propagation of runtime filters. That said, the UNLIKELY here is gonna make negligible difference. Line 99: return Status(Substitute("Cannot handle IR UDF '$0': ", fn_.name.function_name, > This seems like something that would break queries that previously worked - As we discussed offline, this is not a well documented corner case. In some sense, we can consider this as a bug as this code path doesn't honor the query option "DISABLE_CODEGEN". The old code will fall back to codegen (even if it's disabled) if we see IR UDF or if the number of arguments is more than 8. The new patch will simply fail the query. Also added some tests for this path. Line 105: DCHECK(NumFixedArgs() <= 8); > How do we avoid hitting this DCHECK? Do we have test coverage for this? Good point. Converted this to a if-check instead. Will add a test to test_udfs.py to make sure we see appropriate error messages if the native UDF has too many arguments. I checked the FE built-in library and as far as I can tell, we don't have any built-in UDFs with more than 8 arguments. Added a test for this. Line 136: // TODO: don't do this for child exprs > This is an interesting point that we should make sure we fix when we clean Not sure what that means ? Does it mean we shouldn't codegen child exprs ? However, will it be possible to inline all children expr without codegening them too ? http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/plan-fragment-executor.cc File be/src/runtime/plan-fragment-executor.cc: Line 194: plan_->Codegen(state); > Should we do this after we prepared the sink, instead of in the middle of t This is just a temporary step. Eventually, codegen will happen before any plan fragment is instantiated (i.e. Prepare() or anything is called). However, it shouldn't hurt to move it to the point after data sink is prepared. Will update the patch to do so. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS1, Line 257: it has already been called. > No-op if the codegen object has already been created. Done PS1, Line 257: codegen_ > maybe don't mention internal state. Instead something like "Create a codege Done PS1, Line 257: This is : /// created on first use. > Not sure if the last sentence adds much. Done http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 89: // Codegen is disabled unless the expression contains external UDF available only > This seems inconsistent with the behaviour in ScalarFnCall. Shouldn't we fa This is to avoid breaking compatibility. Previously, this expression evaluation with IR-only UDF would have worked. Apparently, the fe-support code just takes any failure as an exception so it may lead to analysis failure for queries which worked previously. As we discussed offline, we will honor the query option in query_ctx in this case. So, if there is any IR UDF, we will enable codegen if the query option says so. We will disable codegen for all other cases. -- To view, visit http://gerrit.cloudera.org:8080/4651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
