GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/15030

    [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProjectExec

    ## What changes were proposed in this pull request?
    
    When there is any Python UDF in the Project between Sort and Limit, it will 
be collected into TakeOrderedAndProjectExec, ExtractPythonUDFs failed to pull 
the Python UDFs out because QueryPlan.expressions does not include the 
expression inside Option[Seq[Expression]].
    
    Ideally, we should fix the `QueryPlan.expressions`, but tried with no luck 
(it always run into infinite loop). In PR, I changed the 
TakeOrderedAndProjectExec to no use Option[Seq[Expression]] to workaround it. 
cc @JoshRosen 
    
    ## How was this patch tested?
    
    Added regression test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark all_expr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15030
    
----
commit 3ea0daf9114ec23c81d84f44ce94ee37aca5e55e
Author: Davies Liu <[email protected]>
Date:   2016-09-09T18:59:29Z

    fix python udf in TakeOrderedAndProjectExec

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to