[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-19 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-19 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

This patch is mostly mechanical move of codegen related logic
from each exec node's Prepare() to its Codegen() function.
After this change, code generation will no longer happen in
Prepare(). Instead, it will happen after Prepare() completes in
PlanFragmentExecutor. This is an intermediate step towards the
final goal of sharing compiled code among fragment instances in
multi-threading.

As part of the clean up, this change also removes the logic for
lazy codegen object creation. In other words, if codegen is enabled,
the codegen object will always be created. This simplifies some
of the logic in ScalarFnCall::Prepare() and various Codegen()
functions by reducing error checking needed. This change also
removes the logic added for tackling IMPALA-1755 as it's not
needed anymore after the clean up.

The clean up also rectifies a not so well documented situation.
Previously, even if a user explicitly sets DISABLE_CODEGEN to true,
we may still codegen a UDF if it was written in LLVM IR or if it
has more than 8 arguments. This patch enforces the query option
by failing the query in both cases. To run the query, the user
must enable codegen. This change also extends the number of
arguments supported in the interpretation path of ScalarFn to 20.

Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Reviewed-on: http://gerrit.cloudera.org:8080/4651
Reviewed-by: Michael Ho 
Tested-by: Internal Jenkins
---
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/old-hash-table.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/compound-predicates.cc
M be/src/exprs/compound-predicates.h
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/fe-support.cc
M be/src/service/query-options.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_udfs.py
70 files changed, 645 insertions(+), 691 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 6: Code-Review+2

Carry Dan's +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-18 Thread Michael Ho (Code Review)
Hello Dan Hecht, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4651

to look at the new patch set (#6).

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..

IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

This patch is mostly mechanical move of codegen related logic
from each exec node's Prepare() to its Codegen() function.
After this change, code generation will no longer happen in
Prepare(). Instead, it will happen after Prepare() completes in
PlanFragmentExecutor. This is an intermediate step towards the
final goal of sharing compiled code among fragment instances in
multi-threading.

As part of the clean up, this change also removes the logic for
lazy codegen object creation. In other words, if codegen is enabled,
the codegen object will always be created. This simplifies some
of the logic in ScalarFnCall::Prepare() and various Codegen()
functions by reducing error checking needed. This change also
removes the logic added for tackling IMPALA-1755 as it's not
needed anymore after the clean up.

The clean up also rectifies a not so well documented situation.
Previously, even if a user explicitly sets DISABLE_CODEGEN to true,
we may still codegen a UDF if it was written in LLVM IR or if it
has more than 8 arguments. This patch enforces the query option
by failing the query in both cases. To run the query, the user
must enable codegen. This change also extends the number of
arguments supported in the interpretation path of ScalarFn to 20.

Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
---
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/old-hash-table.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/compound-predicates.cc
M be/src/exprs/compound-predicates.h
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/fe-support.cc
M be/src/service/query-options.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_udfs.py
70 files changed, 645 insertions(+), 691 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/4651/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

Line 409:   // TODO: fix this
> Done. TODO removed.
I think what the todo is saying is the fix those things and then remove 
"broken_builtin".  Let's not do that in this change, so let's leave the TODO 
there (or clarify it).


http://gerrit.cloudera.org:8080/#/c/4651/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS4, Line 516: 
we should remove this query option when we can break compatibility. How about 
filing a jira against IMPALA 3.0 for that?


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 5: Code-Review+2

Please just put back a TODO for that timestamp codegen stuff.

-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-18 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4651

to look at the new patch set (#5).

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..

IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

This patch is mostly mechanical move of codegen related logic
from each exec node's Prepare() to its Codegen() function.
After this change, code generation will no longer happen in
Prepare(). Instead, it will happen after Prepare() completes in
PlanFragmentExecutor. This is an intermediate step towards the
final goal of sharing compiled code among fragment instances in
multi-threading.

As part of the clean up, this change also removes the logic for
lazy codegen object creation. In other words, if codegen is enabled,
the codegen object will always be created. This simplifies some
of the logic in ScalarFnCall::Prepare() and various Codegen()
functions by reducing error checking needed. This change also
removes the logic added for tackling IMPALA-1755 as it's not
needed anymore after the clean up.

The clean up also rectifies a not so well documented situation.
Previously, even if a user explicitly sets DISABLE_CODEGEN to true,
we may still codegen a UDF if it was written in LLVM IR or if it
has more than 8 arguments. This patch enforces the query option
by failing the query in both cases. To run the query, the user
must enable codegen. This change also extends the number of
arguments supported in the interpretation path of ScalarFn to 20.

Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
---
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/old-hash-table.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/compound-predicates.cc
M be/src/exprs/compound-predicates.h
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/fe-support.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_udfs.py
69 files changed, 642 insertions(+), 690 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/4651/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS4, Line 72:  If overridden in subclass, must also
:   /// call superclass's Codegen().
> does it matter whether this happens first or last?
Not really. Comments updated.


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exec/hash-join-node.cc
File be/src/exec/hash-join-node.cc:

Line 148:   }
> why not do this in ExecNode::Prepare() to avoid having to do the same thing
That would have the unfortunate side effect of adding this message to operators 
which aren't codegen enabled. This may be confusing.


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

PS4, Line 88: whether the codegen was
:   /// enabled 
> we only call this if codegen is enabled, right?  So I'm not sure what this 
Done


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

Line 409:   // TODO: fix this
> not for this change, but i think this timestamp stuff may have been fixed b
Done. TODO removed.


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS4, Line 161: 'codegen_'.
> public method comments shouldn't really talk about private members. how abo
Done


http://gerrit.cloudera.org:8080/#/c/4651/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 532
> why is it okay to always enable codegen now, whereas before we were so care
We always codegen conjunct evaluation for parquet scanner now so this code is 
mostly irrelevant for parquet scanner. For other file formats, they weren't 
really affected by this flag as they always codegen (if implemented). For 
parquet scanner, this flag used to affect whether codegen will happen in 
ScalarFnCall::Prepare() and that had performance impact on conjuncts evaluation.

All this complication stems from lazy creation of the LLVM module to avoid the 
overhead of LLVM module creation when the fragment doesn't have any codegen 
enabled operator:

https://github.com/cloudera/Impala/commit/0686cd9c3ed7ae48d5bd4fe602266034ef871ffc

We didn't have lazy IR code materialization back then so the preparation time 
of an LLVM module was higher (in the order of at least 150+ ms on my dev box). 
With lazy IR code materialization and a recent change to lazily create the 
mapping of IRFunction::Type to llvm::Function*, we pushed the creation time of 
LLVM module to about 10~11ms.

With this patch, if codegen is enabled, we will always create the codegen 
module. This avoids problem such as IMPALA-1755 and IMPALA-3638.


http://gerrit.cloudera.org:8080/#/c/4651/4/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

Line 80: vector.get_value('exec_option')['disable_codegen'] = 1
> this doesn't invalidate any of the preexisting test cases?
AFAIK, no. The preexisting test cases were mostly about failure to load the 
functions. Nothing tied to codegen as far as I can tell.


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS4, Line 72:  If overridden in subclass, must also
:   /// call superclass's Codegen().
does it matter whether this happens first or last?


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exec/hash-join-node.cc
File be/src/exec/hash-join-node.cc:

Line 148:   }
why not do this in ExecNode::Prepare() to avoid having to do the same thing in 
every derived class?


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

PS4, Line 88: whether the codegen was
:   /// enabled 
we only call this if codegen is enabled, right?  So I'm not sure what this part 
means (maybe just delete it).


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

Line 409:   // TODO: fix this
not for this change, but i think this timestamp stuff may have been fixed by 
moving things out of the IR.


http://gerrit.cloudera.org:8080/#/c/4651/4/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS4, Line 161: 'codegen_'.
public method comments shouldn't really talk about private members. how about 
"... for this fragment instance" or something like that.


http://gerrit.cloudera.org:8080/#/c/4651/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 532
why is it okay to always enable codegen now, whereas before we were so careful 
to only enable it based on this heuristic?


http://gerrit.cloudera.org:8080/#/c/4651/4/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

Line 80: vector.get_value('exec_option')['disable_codegen'] = 1
this doesn't invalidate any of the preexisting test cases?


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 4:

Do we still plan to make the ExecNodes' Codegen() be static (in a future 
patch), or has that plan changed?

-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 4:

It would be good to have an additional pair of eyes since it's a significant 
behaviour change.

-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 3:

(8 comments)

Looking good, just some small things then I'll +1.

http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 174: void AggregationNode::Codegen(RuntimeState* state) {
> I thought about something similar but this has the side effect of putting "
Ah, I didn't think about that. This works ok for now then.


http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 169: runtime_profile()->AddCodegenMsg(false, "disabled by query option 
DISABLE_CODEGEN");
Thanks, I think this will be clearer.

It's still a little weird that we don't print anything when expr codegen is 
disabled (since that can make a bit perf difference), but I guess that will be 
fixed naturally by a follow-up patch.


http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hash-join-node.cc
File be/src/exec/hash-join-node.cc:

Line 172:   Function* codegen_process_probe_batch_fn = 
CodegenProcessProbeBatch(codegen, hash_fn);
Long line.


http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 622: return Status("Disabled by query option.");
Dead code I think.


http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 153: Status HdfsScanNodeBase::Prepare(RuntimeState* state) {
It doesn't look like we add the codegen disabled message hre.


http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 700: return Status("Disabled by query option.");
Dead code?


http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

Line 562: #define ARG_DECL_LIST(n) \
Thanks for doing this.

Can you add a comment just giving an example of what the macros expand to?


http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 104:   state.CreateCodegen(), env, JniUtil::internal_exc_class(), 
result_bytes);
Tab?


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-12 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..

IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

This patch is mostly mechanical move of codegen related logic
from each exec node's Prepare() to its Codegen() function.
After this change, code generation will no longer happen in
Prepare(). Instead, it will happen after Prepare() completes in
PlanFragmentExecutor. This is an intermediate step towards the
final goal of sharing compiled code among fragment instances in
multi-threading.

As part of the clean up, this change also removes the logic for
lazy codegen object creation. In other words, if codegen is enabled,
the codegen object will always be created. This simplifies some
of the logic in ScalarFnCall::Prepare() and various Codegen()
functions by reducing error checking needed. This change also
removes the logic added for tackling IMPALA-1755 as it's not
needed anymore after the clean up.

The clean up also rectifies a not so well documented situation.
Previously, even if a user explicitly sets DISABLE_CODEGEN to true,
we may still codegen a UDF if it was written in LLVM IR or if it
has more than 8 arguments. This patch enforces the query option
by failing the query in both cases. To run the query, the user
must enable codegen. This change also extends the number of
arguments supported in the interpretation path of ScalarFn to 20.

Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
---
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/old-hash-table.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/compound-predicates.cc
M be/src/exprs/compound-predicates.h
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorted-run-merger.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/fe-support.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_udfs.py
69 files changed, 597 insertions(+), 673 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/4651/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 174:   bool codegen_enabled = false;
> Please let me know if the approach in the new patch is okay with you.
I think we should still avoid putting the AddCodegenMsg in all of the Prepare() 
functions. We can do that easily if you have ExecNode::CodegenTree() that does 
roughly:


  if (state->codegen_enabled()) {
Codegen();
  } else {
AddCodegenMsg(false, "by DISABLE_CODEGEN query option");
  }
  for (ExecNode* child : children_) {
CodegenTree(child);
  }

I think that more clearly expresses the logic and requires less code spread 
around all the exec nodes.

I feel less strongly about whether RuntimeState should be an argument or not in 
this staging patch. It shouldn't be in the end state I don't think.


http://gerrit.cloudera.org:8080/#/c/4651/2/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 632:   Status status = 
build_expr_ctxs_[i]->root()->GetCodegendComputeFn(codegen, _fn);
Long line I think


http://gerrit.cloudera.org:8080/#/c/4651/2/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS2, Line 294: CodegenProcessBatchStreaming
Let's fix these to thread the codegen argument through (instead of getting it 
via 'this')


http://gerrit.cloudera.org:8080/#/c/4651/2/testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
File testdata/workloads/functional-query/queries/QueryTest/udf-errors.test:

Line 28: create function if not exists udf_test_errors.nine_args(int, int, int, 
int, int, int, int, int, int) returns int
> I tried. Unfortunately, the test vector seems to override things. I can add
Hmm, weird.

Maybe add a comment in the .py file then to explain why its disabling codegen.


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 1:

(3 comments)

FWIW, this patch probably needs to wait for 
https://gerrit.cloudera.org/#/c/4402. One problem noticed is that the 
coordinator will now start remote fragments later due to the time to create 
codegen module. With the aforementioned patch Henry is working on, the root 
fragment will be started asynchronously together with remote fragments so the 
bottleneck will be gone. Will address other comments in the upcoming patch.

http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 174:   if (state->codegen_enabled()) {
> I don't think we need to add more functions than the current solution: we n
Please let me know if the approach in the new patch is okay with you.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 419:   LlvmCodeGen* codegen = state->codegen();
> Maybe it would make more sense to have the codegen'd exprs owned by the Llv
Certainly, the codegen'ed IR function may make sense to be owned by the codegen 
object. However, it seems odd to have to poke the codegen object to get the 
type of an expression when codegen is disabled. I will see how the changes to 
Expr goes before deciding on it.


http://gerrit.cloudera.org:8080/#/c/4651/2/testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
File testdata/workloads/functional-query/queries/QueryTest/udf-errors.test:

Line 28:  CATCH
> Can we just add the set disable_codegen=1 flag before each of these queries
I tried. Unfortunately, the test vector seems to override things. I can add a 
set statement here but it'll be a no-op.


-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..


Patch Set 1:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4651/1//COMMIT_MSG
Commit Message:

PS1, Line 7:  IMPALA-3686
> wrong JIRA number: IMPALA-3638
I am not 100% sure the extent which IMPALA-3638 is trying to cover so I am fine 
if you think a new bug is better. Either way, this patch alleviates the issue 
mentioned in IMPALA-3638 as we will now codegen for ScalarFnCall even if there 
is no codegen enabled operator in the fragment.

The caveat you mentioned is due to the fact that we don't have a way to invoke 
the JIT compiled function in the interpretation path of Expr except for 
ScalarFnCall. Once we fix that, then codegen'ing all the Exprs makes sense. 
That may also explain the oddness of ScalarFnCall::Prepare() (i.e. IMPALA-4233)


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 168: 
> Unnecessary blank line.
Done


Line 174:   if (state->codegen_enabled()) {
> It would be great if we could avoid duplicating this check in every Codegen
Don't want to introduce another function per exec node just for that. I updated 
the patch to add the string to runtime profile in Prepare() if codegen is 
disabled. That may fit in well with what we want down the road if each fragment 
instance populates its pointers to compiled functions at Prepare().


Line 833:   LlvmCodeGen* codegen = state->codegen();
> Maybe we should just pass the codegen object in as an argument?
Not so sure yet as we may need to access the Expr stored in RuntimeState 
(eventually) so I prefer to keep it as is for now as the code is still in flux.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 295:   runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str());
> Extra blank line.
Done


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 419:   LlvmCodeGen* codegen = state->codegen();
> IMO we should be passing in the codegen object rather than the state in mos
Not so sure about this yet as we may need RuntimeState for accessing Expr in 
the future.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 378: Status Literal::GetCodegendComputeFn(RuntimeState* state, 
llvm::Function** fn) {
> I feel like this should take codegen as an argument instead of state.
Yes, this should take codegen as argument. I have converted all 
GetCodendComputeFn().


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

PS1, Line 98: UNLIKELY
> UNLIKELY seems unnecessary since this isn't a hot path.
It's part of the Prepare() path which may have impact to the latency of 
propagation of runtime filters. That said, the UNLIKELY here is gonna make 
negligible difference.


Line 99:   return Status(Substitute("Cannot handle IR UDF '$0': ", 
fn_.name.function_name,
> This seems like something that would break queries that previously worked -
As we discussed offline, this is not a well documented corner case. In some 
sense, we can consider this as a bug as this code path doesn't honor the query 
option "DISABLE_CODEGEN". The old code will fall back to codegen (even if it's 
disabled) if we see IR UDF or if the number of arguments is more than 8. The 
new patch will simply fail the query.

Also added some tests for this path.


Line 105: DCHECK(NumFixedArgs() <= 8);
> How do we avoid hitting this DCHECK? Do we have test coverage for this?
Good point. Converted this to a if-check instead. Will add a test to 
test_udfs.py to make sure we see appropriate error messages if the native UDF 
has too many arguments.

I checked the FE built-in library and as far as I can tell, we don't have any 
built-in UDFs with more than 8 arguments.

Added a test for this.


Line 136: // TODO: don't do this for child exprs
> This is an interesting point that we should make sure we fix when we clean 
Not sure what that means ? Does it mean we shouldn't codegen child exprs ? 
However, will it be possible to inline all children expr without codegening 
them too ?


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 194:   plan_->Codegen(state);
> Should we do this after we prepared the sink, instead of in the middle of t
This is just a temporary step. Eventually, codegen will happen before any plan 
fragment is instantiated (i.e. Prepare() or anything is called).

However, it shouldn't hurt to move it to the point after data sink is prepared. 
Will update the patch to do so.


http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/runtime/runtime-state.h
File 

[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

2016-10-07 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
..

IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

This patch is mostly mechanical move of codegen related logic
from each exec node's Prepare() to its Codegen() function.
After this change, code generation will no longer happen in
Prepare(). Instead, it will happen after Prepare() completes in
PlanFragmentExecutor. This is an intermediate step towards the
final goal of sharing compiled code among fragment instances in
multi-threading.

As part of the clean up, this change also removes the logic for
lazy codegen object creation. In other words, if codegen is enabled,
the codegen object will always be created. This simplifies some
of the logic in ScalarFnCall::Prepare() and various Codegen()
functions by reducing error checking needed. This change also
removes the logic added for tackling IMPALA-1755 as it's not
needed anymore after the clean up.

The clean up also rectifies a not so well documented situation.
Previously, even if a user explicitly sets DISABLE_CODEGEN to true,
we may still codegen some UDFs if that UDFs are written in LLVM IR
or if the UDF has more than 8 arguments. This patch enforces the
query option by failing the query in both cases. To run the query
in those cases above, the user must enable codegen.

Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
---
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/compound-predicates.cc
M be/src/exprs/compound-predicates.h
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/is-not-empty-predicate.cc
M be/src/exprs/is-not-empty-predicate.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.cc
M be/src/exprs/tuple-is-null-predicate.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/fe-support.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_udfs.py
62 files changed, 435 insertions(+), 448 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/4651/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong