GitHub user yuchenhuo opened a pull request:

    https://github.com/apache/spark/pull/20953

    [SPARK-23822][SQL] Improve error message for Parquet schema mismatches

    ## What changes were proposed in this pull request?
    
    This pull request tries to improve the error message for spark while 
reading parquet files with different schemas, e.g. One with a STRING column and 
the other with a INT column. A new 
ParquetSchemaColumnConvertNotSupportedException is added to replace the old 
UnsupportedOperationException. The Exception is again wrapped in 
FileScanRdd.scala to throw a more a general QueryExecutionException with the 
actual parquet file name which trigger the exception.
    
    ## How was this patch tested?
    
    Unit tests added to check the new exception and verify the error messages.
    
    Also manually tested with two parquet with different schema to check the 
error message.
    
    <img width="1125" alt="screen shot 2018-03-30 at 4 03 04 pm" 
src="https://user-images.githubusercontent.com/37087310/38156580-dd58a140-3433-11e8-973a-b816d859fbe1.png";>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yuchenhuo/OSS SPARK-23822

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20953
    
----
commit 375e8991f15b61b6c595c2da9ae345e61791a7f7
Author: Yuchen Huo <yuchen.huo@...>
Date:   2018-03-29T16:58:34Z

    Wrap parquet mismatch error message

commit ab55256efb1a23831a53a9c67475ff653dc91e96
Author: Yuchen Huo <yuchen.huo@...>
Date:   2018-03-30T22:39:03Z

    fix wrong order of the physical and logical schema

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to