GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/16120

    [SPARK-18634][PySpark][SQL] Corruption and Correctness issues with 
exploding Python UDFs

    ## What changes were proposed in this pull request?
    
    As reported in the Jira, there are some weird issues with exploding Python 
UDFs in SparkSQL.
    
    There are 2 cases where based on the DataType of the exploded column, the 
result can be flat out wrong, or corrupt. Seems like something bad is happening 
when telling Tungsten the schema of the rows during or after applying the UDF.
    
    Please check the code below for reproduction.
    
    Notebook: 
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6186780348633019/3425836135165635/4343791953238323/latest.html
    
    
    ## How was this patch tested?
    
    I will add test cases later.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 fix-py-udf-with-generator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16120
    
----
commit a31432a68b3de38d163165048f9174bfee196b14
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-12-02T14:41:17Z

    Change GenerateExec's output so PySpark's UDF can work with Generator.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to