johanl-db opened a new pull request, #44513:
URL: https://github.com/apache/spark/pull/44513

   ### What changes were proposed in this pull request?
   This is a follow-up from https://github.com/apache/spark/pull/44368 
implementing an additional type promotion to decimals with larger precision and 
scale.
   As long as the precision increases by at least as much as the scale, the 
decimal values can be promoted without loss of precision: Decimal(6, 2) -> 
Decimal(8, 4):  1234.56 -> 1234.5600.
   
   The non-vectorized reader (parquet-mr) is already able to do this type 
promotion, this PR implements it for the vectorized reader.
   
   ### Why are the changes needed?
   This allows reading multiple parquet files that contain decimal with 
different precision/scales
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the following now succeeds when using the vectorized Parquet reader:
   ```
     Seq(20).toDF($"a".cast(DecimalType(4, 2))).write.parquet(path)
     spark.read.schema("a decimal(6, 4)").parquet(path).collect()
   ```
   It failed before with the vectorized reader and succeeded with the 
non-vectorized reader.
   
   ### How was this patch tested?
   - Tests added to `ParquetWideningTypeSuite` to cover decimal promotion 
between decimals with different physical types: INT32, INT64, 
FIXED_LEN_BYTE_ARRAY.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to