GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/21851

    [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

    ## What changes were proposed in this pull request?
    
    The HandleNullInputsForUDF would always add a new `If` node every time it 
is applied. That would cause a difference between the same plan being analyzed 
once and being analyzed twice (or more), thus raising issues like plan not 
matched in the cache manager. The solution is to mark the arguments as 
null-checked, which is to add a "AssertNotNull" node above those arguments, 
when adding the UDF under an `If` node, because clearly the UDF will not be 
called when any of those arguments is null.
    
    ## How was this patch tested?
    
    Add new tests under sql/UDFSuite and AnalysisSuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-24891

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21851
    
----
commit 62fa9cf99610d8fa67d123450f2721cac0b5899f
Author: maryannxue <maryannxue@...>
Date:   2018-07-23T18:56:05Z

    [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to