Hello Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23963
to look at the new patch set (#12).
Change subject: IMPALA-14575: Add constant handling for Hive GenericUDFs
......................................................................
IMPALA-14575: Add constant handling for Hive GenericUDFs
Based on PoC by Csaba Ringhofer.
If an argument is constant, it is now only evaluated and copied
to the input buffer once, in HiveUdfCall::OpenEvaluator.
This means, we save the re-evaluation and re-copying of the value
for each evaluation.
The "isConstant" flags for arguments are also sent to the frontend,
where ConstantObjectInspectors will be created for them, so any
constant optimization in the UDF on Hive's side will be enabled.
Moved input handling for Hive UDF calls to a separate class
HiveUdfInputHandler.
Benchmark:
Checked with the following:
set num_nodes=1; set mt_dop=1;
select count(*) from tpch.lineitem where st_intersects(
st_point(l_partkey, l_suppkey),
st_geomfromtext(
"polygon ((0 0, 0 500000, 500000 500000, 500000 0, 0 0))"
)
);
Before change: 3.15s (MaterializeTupleTime: 2s601ms)
After change: 1.54s (MaterializeTupleTime: 879.397ms)
In some cases with geospatial, even bigger performance gain:
select count(*) from tpch.lineitem where st_intersects(
st_point(l_partkey, l_suppkey),
st_buffer(st_geomfromtext("point (250000 250000)"), 250000)
);
Before change: MaterializeTupleTime: 19s266ms
After change: MaterializeTupleTime: 1s587ms
Note that st_intersects is optimized on the Hive side when one argument
is a constant, contributing to most of the gain in performance.
The skipping of re-evaluation and copying is relatively insignificant,
comparing only that doesn't yield any measurable difference.
Testing:
-added a test UDF GenericAlltypeArgConstCheckUdf that prints
const information about arguments
-added const arg check cases to generic-java-udf.test
Change-Id: I4a6ca8c0bab499dffed88bb9786753da559af4c5
---
M be/src/exprs/expr-value.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java
M
fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
A fe/src/main/java/org/apache/impala/hive/executor/HiveUdfInputHandler.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A
java/test-hive-udfs/src/main/java/org/apache/impala/GenericAlltypeArgConstCheckUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M
testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
16 files changed, 658 insertions(+), 176 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/23963/12
--
To view, visit http://gerrit.cloudera.org:8080/23963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a6ca8c0bab499dffed88bb9786753da559af4c5
Gerrit-Change-Number: 23963
Gerrit-PatchSet: 12
Gerrit-Owner: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>