dongjoon-hyun opened a new pull request #31319: URL: https://github.com/apache/spark/pull/31319
### What changes were proposed in this pull request? This PR aims to the correctness issues during reading decimal values from Parquet files. - For *MR* code path, `ParquetRowConverter` can read Parquet's decimal values with the original precision and scale written in the corresponding footer. - For *Vectorized* code path, `ParquetReadSupport` throws explicit Exception. ### Why are the changes needed? Currently, Spark returns incorrect results when the Parquet file's decimal precision and scale are different from the Spark's schema. This happens when there is multiple files with different schema or HiveMetastore has a new schema. In general, Spark is designed to throw `SchemaColumnConvertNotSupportedException`, but SPARK-34212 reports the missed cases. ### Does this PR introduce _any_ user-facing change? Yes. This fixes the correctness issue. ### How was this patch tested? Pass with the newly added test case. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
