GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/17848

    [SPARK-20586] [SQL] Add deterministic and distinctLike to ScalaUDF and 
JavaUDF [WIP]

    ### What changes were proposed in this pull request?
    
https://hive.apache.org/javadocs/r2.0.1/api/org/apache/hadoop/hive/ql/udf/UDFType.html
    
    Like Hive UDFType, we should allow users to add the extra flags for 
ScalaUDF and JavaUDF too. {{stateful}}/{{impliesOrder}} are not applicable to 
ScalaUDF. Thus, we only add the following two flags. 
    
    - deterministic: Certain optimizations should not be applied if UDF is not 
deterministic. Deterministic UDF returns same result each time it is invoked 
with a particular input. This determinism just needs to hold within the context 
of a query.
    
    - distinctLike: A UDF is considered distinctLike if the UDF can be 
evaluated on just the distinct values of a column. Examples include min and max 
UDFs. This information is used by metadata-only optimizer.
    
    When the deterministic flag is not correctly set, the results could be 
wrong. 
    
    Also corrected an issue in the ScalaUDF name loss in UDF registration. 
    
    For ScalaUDF in Dataset APIs, users can call the following three extra APIs 
for `UserDefinedFunction` to make the corresponding changes.
    - `withName`: Updates UserDefinedFunction with a given name.
    - `nonDeterministic`: Updates UserDefinedFunction to non-deterministic.
    - `distinctLike` : Updates UserDefinedFunction to distinctLike.
    
    ### How was this patch tested?
    Added test cases for both ScalaUDF and JavaUDF 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark udfRegister

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17848
    
----
commit 7df4d9d7fc59011ebefb470f29288d43d51ebaee
Author: Xiao Li <[email protected]>
Date:   2017-04-19T22:11:42Z

    temp fix1

commit 88fde5f8a80496faf1474622e8bbbd2969a8231f
Author: Xiao Li <[email protected]>
Date:   2017-05-03T22:26:04Z

    fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to