Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-4432: Handle internal codegen disabling properly
......................................................................

IMPALA-4432: Handle internal codegen disabling properly

There are some conditions in which codegen is disabled internally
even if it's enabled in the query option. For instance, the single
node optimization or the expression evaluation requests sent from
the FE to the BE. These internal disabling of codegen are advisory
as their purposes are to reduce the latency for tables with no or
very few rows. The internal disabling of codegen doesn't interact
well with UDFs which cannot be interpreted (e.g. IR UDF) as it
conflates with the 'disable_codegen' query option set by the user.
As a result, it's hard to differentiate between when codegen is
disabled explicitly by users and when it is disabled internally.

This change fixes the problem above by adding an explicit flag in
TQueryCtx to indicate that codegen is disabled internally. This flag
is only advisory. For cases in which codegen is needed to function,
this internal flag is ignored and if codegen is disabled via query
option, an error is thrown. For this new flag to work with ScalarFnCall,
codegen needs to happen after ScalarFnCall::Prepare() because it's
hard to tell if a fragment contains any UDF that cannot be interpreted
until after ScalarFnCall::Prepare() is called. However, Prepare() needs
the codegen object to codegen so it needs to be created before Prepare().
We can either always create the codegen module or defer codegen to a point
after ScalarFnCall::Prepare(). The former has the downside of introducing
unnecessary latency for say single-node optimization so the latter is
implemented. It is needed as part of IMPALA-4192 any way.

After this change, ScalarFnCall expressions which need to be codegen'd
are inserted into a vector in RuntimeState in ScalarFnCall::Prepare().
Later in the codegen phase, these expressions' GetCodegendComputeFn()
will be called after codegen for operators is done. If any of these
expressions are already codegen'd indirectly by the operators,
GetCodegendComputeFn() will be a no-op. This preserves the behavior
that ScalarFnCall will always be codegen'd even if the fragment
doesn't contain any codegen enabled operators.

Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
---
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/fe-support.cc
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/planner/Planner.java
M tests/query_test/test_udfs.py
24 files changed, 167 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/5105/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to