GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/19662

    [SPARK-22446] Declare StringIndexerModel indexer udf as nondeterministic

    ## What changes were proposed in this pull request?
    
    UDFs that can cause runtime exception on invalid data are not safe to 
pushdown, because its behavior depends on its position in the query plan. 
Pushdown of it will risk to change its original behavior.
    
    The example reported in the JIRA and taken as test case shows this issue. 
We should declare UDFs that can cause runtime exception on invalid data as 
non-determinstic.
    
    This updates the document of `deterministic` property in `Expression` and 
states clearly an UDF that can cause runtime exception on some specific input, 
should be declared as non-determinstic.
    
    ## How was this patch tested?
    
    Added test. Manually test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-22446

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19662.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19662
    
----
commit d9bdf257164dfe32381d47d75b9e3dad72cd76b5
Author: Liang-Chi Hsieh <[email protected]>
Date:   2017-11-06T04:23:54Z

    Declare indexer udf as nondeterministic so it can't be pushed down.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to