[PR] [SPARK-48498][SQL][3.5] Always do char padding in predicates [spark]

via GitHub Wed, 12 Jun 2024 05:59:04 -0700


jackylee-ch opened a new pull request, #46958:
URL: https://github.com/apache/spark/pull/46958


   ### What changes were proposed in this pull request?
   
   For some data sources, CHAR type padding is not applied on both the write 
and read sides (by disabling `spark.sql.readSideCharPadding`), as a different 
SQL flavor, which is similar to MySQL: 
https://dev.mysql.com/doc/refman/8.0/en/char.html
   
   However, there is a bug in Spark that we always pad the string literal when 
comparing CHAR type and STRING literals, which assumes the CHAR type columns 
are always padded, either on the write side or read side. This is not always 
true.
   
   This PR makes Spark always pad the CHAR type columns when comparing with 
string literals, to satisfy the CHAR type semantic.
   
   ### Why are the changes needed?
   
   bug fix if people disable read side char padding
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. After this PR, comparing CHAR type with STRING literals follows the 
CHAR semantic, while before it mostly returns false.
   
   ### How was this patch tested?
   
   new tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-48498][SQL][3.5] Always do char padding in predicates [spark]

Reply via email to