abhiips07 commented on PR #47246:
URL: https://github.com/apache/spark/pull/47246#issuecomment-2401479120

   > Why does spark SQL have to address such a type casting issue? Isn't it 
user's responsibility to mention correct datatype in a query (string instead of 
integer in this case)?
   > 
   > Why does spark SQL have to behave the way MySQL or Redshift behaves? There 
are many SQL systems. I don't know if we can be compatible with all of them.
   > 
   > If we decide to address such a type casting issue, then should we also 
address type casting issue where comparing a string column to a boolean, float, 
etc. value results in an empty result set?
   
   
   While it's true that users should specify the correct data types, Spark SQL 
might handle this implicit type casting for a few reasons:
   
   Many people expect behavior similar to SQL systems like MySQL or Redshift, 
where implicit casting is allowed, making migrations easier and queries less 
error-prone.
   
   While we can’t align with all SQL systems, following common behavior 
improves predictability and usability.
   
   ### Take a string-to-integer comparison as an example:
   `**SELECT * FROM table WHERE string_col = 123;**`
   If string_col contains values like "123", most users would expect Spark SQL 
to match those rows, since systems like MySQL or Redshift allow this kind of 
conversion. Handling this case makes sense and aligns with user expectations, 
avoiding unnecessary query failures/empty results.
   
   ### Now, consider a string-to-boolean comparison:
   `**SELECT * FROM table WHERE string_col = true;**`
   This is where things get tricky. Should "true" match true? What about 
strings like "yes", "1", or "false"?
   
   ### A similar issue arises with a string-to-float comparison:
   `**SELECT * FROM table WHERE string_col = 5.5;**`
   Should "5.5" match 5.5? How about "05.5" or "5.500"? These variations make 
it unclear what should be considered a match, adding complexity and increasing 
the chances of unexpected outcomes.
   
   Imo, while handling straightforward cases like string-to-integer conversions 
is practical, trying to support more ambiguous comparisons like 
string-to-boolean or string-to-float could cause more confusion than it's worth
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to