GitHub user gberger opened a pull request:

    https://github.com/apache/spark/pull/19792

    [SPARK-22566][PYTHON] Better error message for `_merge_type` in Pandas to 
Spark DF conversion

    ## What changes were proposed in this pull request?
    
    It provides a better error message when doing 
`spark_session.createDataFrame(pandas_df)` with no schema and an error occurs 
in the schema inference due to incompatible types.
    
    The Pandas column names are propagated down and the error message mentions 
which column had the merging error. 
    
    https://issues.apache.org/jira/browse/SPARK-22566
    
    ## How was this patch tested?
    
    Manually in the `./bin/pyspark` console, and with `./dev/run-tests`.
    
    <img width="873" alt="screen shot 2017-11-21 at 13 29 49" 
src="https://user-images.githubusercontent.com/3977115/33080121-382274e0-cecf-11e7-808f-057a65bb7b00.png";>
    
    I state that the contribution is my original work and that I license the 
work to the Apache Spark project under the project’s open source license.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gberger/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19792
    
----
commit 518fdd4f3d0e968cef2e3ba1b0220daee5ee7778
Author: Guilherme Berger <[email protected]>
Date:   2017-11-21T15:06:25Z

    [SPARK-22566][PYTHON] Better error message for `_merge_type` in Pandas to 
Spark DF conversion

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to