Michael Ho has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
......................................................................


Patch Set 1:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4651/1//COMMIT_MSG
Commit Message:

PS1, Line 7:  IMPALA-3686
> wrong JIRA number: IMPALA-3638
I am not 100% sure the extent which IMPALA-3638 is trying to cover so I am fine 
if you think a new bug is better. Either way, this patch alleviates the issue 
mentioned in IMPALA-3638 as we will now codegen for ScalarFnCall even if there 
is no codegen enabled operator in the fragment.

The caveat you mentioned is due to the fact that we don't have a way to invoke 
the JIT compiled function in the interpretation path of Expr except for 
ScalarFnCall. Once we fix that, then codegen'ing all the Exprs makes sense. 
That may also explain the oddness of ScalarFnCall::Prepare() (i.e. IMPALA-4233)


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 168: 
> Unnecessary blank line.
Done


Line 174:   if (state->codegen_enabled()) {
> It would be great if we could avoid duplicating this check in every Codegen
Don't want to introduce another function per exec node just for that. I updated 
the patch to add the string to runtime profile in Prepare() if codegen is 
disabled. That may fit in well with what we want down the road if each fragment 
instance populates its pointers to compiled functions at Prepare().


Line 833:   LlvmCodeGen* codegen = state->codegen();
> Maybe we should just pass the codegen object in as an argument?
Not so sure yet as we may need to access the Expr stored in RuntimeState 
(eventually) so I prefer to keep it as is for now as the code is still in flux.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 295:   runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str());
> Extra blank line.
Done


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 419:   LlvmCodeGen* codegen = state->codegen();
> IMO we should be passing in the codegen object rather than the state in mos
Not so sure about this yet as we may need RuntimeState for accessing Expr in 
the future.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 378: Status Literal::GetCodegendComputeFn(RuntimeState* state, 
llvm::Function** fn) {
> I feel like this should take codegen as an argument instead of state.
Yes, this should take codegen as argument. I have converted all 
GetCodendComputeFn().


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

PS1, Line 98: UNLIKELY
> UNLIKELY seems unnecessary since this isn't a hot path.
It's part of the Prepare() path which may have impact to the latency of 
propagation of runtime filters. That said, the UNLIKELY here is gonna make 
negligible difference.


Line 99:       return Status(Substitute("Cannot handle IR UDF '$0': ", 
fn_.name.function_name,
> This seems like something that would break queries that previously worked -
As we discussed offline, this is not a well documented corner case. In some 
sense, we can consider this as a bug as this code path doesn't honor the query 
option "DISABLE_CODEGEN". The old code will fall back to codegen (even if it's 
disabled) if we see IR UDF or if the number of arguments is more than 8. The 
new patch will simply fail the query.

Also added some tests for this path.


Line 105:     DCHECK(NumFixedArgs() <= 8);
> How do we avoid hitting this DCHECK? Do we have test coverage for this?
Good point. Converted this to a if-check instead. Will add a test to 
test_udfs.py to make sure we see appropriate error messages if the native UDF 
has too many arguments.

I checked the FE built-in library and as far as I can tell, we don't have any 
built-in UDFs with more than 8 arguments.

Added a test for this.


Line 136:     // TODO: don't do this for child exprs
> This is an interesting point that we should make sure we fix when we clean 
Not sure what that means ? Does it mean we shouldn't codegen child exprs ? 
However, will it be possible to inline all children expr without codegening 
them too ?


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 194:   plan_->Codegen(state);
> Should we do this after we prepared the sink, instead of in the middle of t
This is just a temporary step. Eventually, codegen will happen before any plan 
fragment is instantiated (i.e. Prepare() or anything is called).

However, it shouldn't hurt to move it to the point after data sink is prepared. 
Will update the patch to do so.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 257: it has already been called.
> No-op if the codegen object has already been created.
Done


PS1, Line 257: codegen_
> maybe don't mention internal state. Instead something like "Create a codege
Done


PS1, Line 257:  This is
             :   /// created on first use.
> Not sure if the last sentence adds much.
Done


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 89:   // Codegen is disabled unless the expression contains external UDF 
available only
> This seems inconsistent with the behaviour in ScalarFnCall. Shouldn't we fa
This is to avoid breaking compatibility. Previously, this expression evaluation 
with IR-only UDF would have worked. Apparently, the fe-support code just takes 
any failure as an exception so it may lead to analysis failure for queries 
which worked previously.

As we discussed offline, we will honor the query option in query_ctx in this 
case. So, if there is any IR UDF, we will enable codegen if the query option 
says so. We will disable codegen for all other cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to