[GitHub] [spark] dongjoon-hyun opened a new pull request #31319: [SPARK-34212][SQL] Fix incorrect decimal reading from Parquet files

GitBox Mon, 25 Jan 2021 02:39:38 -0800


dongjoon-hyun opened a new pull request #31319:
URL: https://github.com/apache/spark/pull/31319



   ### What changes were proposed in this pull request?
   
   This PR aims to the correctness issues during reading decimal values from 
Parquet files.
   - For *MR* code path, `ParquetRowConverter` can read Parquet's decimal 
values with the original precision and scale written in the corresponding 
footer.
   - For *Vectorized* code path, `ParquetReadSupport` throws explicit Exception.
   
   ### Why are the changes needed?
   
   Currently, Spark returns incorrect results when the Parquet file's decimal 
precision and scale are different from the Spark's schema. This happens when 
there is multiple files with different schema or HiveMetastore has a new schema.
   
   In general, Spark is designed to throw 
`SchemaColumnConvertNotSupportedException`, but SPARK-34212 reports the missed 
cases.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. This fixes the correctness issue.
   
   ### How was this patch tested?
   
   Pass with the newly added test case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun opened a new pull request #31319: [SPARK-34212][SQL] Fix incorrect decimal reading from Parquet files

Reply via email to