GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/17848
[SPARK-20586] [SQL] Add deterministic and distinctLike to ScalaUDF and
JavaUDF [WIP]
### What changes were proposed in this pull request?
https://hive.apache.org/javadocs/r2.0.1/api/org/apache/hadoop/hive/ql/udf/UDFType.html
Like Hive UDFType, we should allow users to add the extra flags for
ScalaUDF and JavaUDF too. {{stateful}}/{{impliesOrder}} are not applicable to
ScalaUDF. Thus, we only add the following two flags.
- deterministic: Certain optimizations should not be applied if UDF is not
deterministic. Deterministic UDF returns same result each time it is invoked
with a particular input. This determinism just needs to hold within the context
of a query.
- distinctLike: A UDF is considered distinctLike if the UDF can be
evaluated on just the distinct values of a column. Examples include min and max
UDFs. This information is used by metadata-only optimizer.
When the deterministic flag is not correctly set, the results could be
wrong.
Also corrected an issue in the ScalaUDF name loss in UDF registration.
For ScalaUDF in Dataset APIs, users can call the following three extra APIs
for `UserDefinedFunction` to make the corresponding changes.
- `withName`: Updates UserDefinedFunction with a given name.
- `nonDeterministic`: Updates UserDefinedFunction to non-deterministic.
- `distinctLike` : Updates UserDefinedFunction to distinctLike.
### How was this patch tested?
Added test cases for both ScalaUDF and JavaUDF
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark udfRegister
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17848.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17848
----
commit 7df4d9d7fc59011ebefb470f29288d43d51ebaee
Author: Xiao Li <[email protected]>
Date: 2017-04-19T22:11:42Z
temp fix1
commit 88fde5f8a80496faf1474622e8bbbd2969a8231f
Author: Xiao Li <[email protected]>
Date: 2017-05-03T22:26:04Z
fix.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]