johanl-db opened a new pull request, #44803:
URL: https://github.com/apache/spark/pull/44803

   ### What changes were proposed in this pull request?
   This is a follow-up from https://github.com/apache/spark/pull/44368 and 
https://github.com/apache/spark/pull/44513, implementing an additional type 
promotion from integers to decimals in the parquet vectorized reader, bringing 
it at parity with the non-vectorized reader in that regard.
   
   ### Why are the changes needed?
   This allows reading parquet files that have different schemas and mix 
decimals and integers - e.g reading files containing either `Decimal(15, 2)` 
and `INT32` as `Decimal(15, 2)` - as long as the requested decimal type is 
large enough to accommodate the integer values without precision loss.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the following now succeeds when using the vectorized Parquet reader:
   ```
     Seq(20).toDF($"a".cast(IntegerType)).write.parquet(path)
     spark.read.schema("a decimal(12, 0)").parquet(path).collect()
   ```
   It failed before with the vectorized reader and succeeded with the 
non-vectorized reader.
   
   ### How was this patch tested?
   - Tests added to `ParquetWideningTypeSuite`
   - Updated relevant `ParquetQuerySuite` test.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to