GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/15928
[SPARK-18478][SQL] Support codegen'd Hive UDFs
## What changes were proposed in this pull request?
This pr is to support codegen'd Hive UDFs.
## How was this patch tested?
Add tests for checking if plans have codegen'd Hive UDFs in `HiveUDFsSuite`.
The performance gains are as follows;
For `HiveSimpleUDF`,
```
Call Hive UDF: Best/Avg Time(ms) Rate(M/s) Per
Row(ns) Relative
----------------------------------------------------------------------------------------------
Call Hive UDF wholestage off 3 / 3 43794.0
0.0 1.0X
Call Hive UDF wholestage on 1 / 2 101551.3
0.0 2.3X
```
For `HiveGenericUDF`,
```
Call Hive generic UDF: Best/Avg Time(ms) Rate(M/s) Per
Row(ns) Relative
----------------------------------------------------------------------------------------------
Call Hive generic UDF wholestage off 2 / 2 86919.9
0.0 1.0X
Call Hive generic UDF wholestage on 1 / 1 143314.2
0.0 1.6X
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark SupportHiveUdfCodegen
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15928.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15928
----
commit 4b2a804a41d00bac24ba67aefd6ed8d5293ca0cd
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-11-17T00:44:02Z
Support codegen for HiveUdf
commit b89c835367760faacbcda92caa1aab886d96e598
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-11-17T07:12:12Z
Add benchmark results
commit 0490984c1f4c4bfbeb6b6978cdcedcc57279f97e
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-11-17T16:55:30Z
Support codegen for HiveGenericUdf
commit 9f98bcacae329f3a06a0a4f62c1a508b91707ffc
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-11-18T05:29:37Z
Brush up code
commit 54775b2e0c0395379c9df7f7baa3221cf8e9585f
Author: Takeshi YAMAMURO <[email protected]>
Date: 2016-11-18T07:02:43Z
Add tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]