aokolnychyi opened a new pull request #23632: [SPARK-26706][SQL] Fix 
Cast$mayTruncate for bytes
URL: https://github.com/apache/spark/pull/23632
 
 
   ## What changes were proposed in this pull request?
   
   This PR contains a minor change in `Cast$mayTruncate` that fixes its logic 
for bytes.
   
   Right now, `mayTruncate(ByteType, LongType)` returns `false` while 
`mayTruncate(ShortType, LongType)` returns `true`. Consequently, 
`spark.range(1, 3).as[Byte]` and `spark.range(1, 3).as[Short]` behave 
differently.
   
   Potentially, this bug can silently corrupt someone's data.
   ```
   // executes silently even though Long is converted into Byte
   spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
     .map(b => b - 1)
     .show()
   +-----+
   |value|
   +-----+
   |  -12|
   |  -11|
   |  -10|
   |   -9|
   |   -8|
   |   -7|
   |   -6|
   |   -5|
   |   -4|
   |   -3|
   +-----+
   // throws an AnalysisException: Cannot up cast `id` from bigint to smallint 
as it may truncate
   spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
     .map(s => s - 1)
     .show()
   ```
   ## How was this patch tested?
   
   This PR comes with a set of unit tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to