abhiips07 commented on PR #47246: URL: https://github.com/apache/spark/pull/47246#issuecomment-2401479120
> Why does spark SQL have to address such a type casting issue? Isn't it user's responsibility to mention correct datatype in a query (string instead of integer in this case)? > > Why does spark SQL have to behave the way MySQL or Redshift behaves? There are many SQL systems. I don't know if we can be compatible with all of them. > > If we decide to address such a type casting issue, then should we also address type casting issue where comparing a string column to a boolean, float, etc. value results in an empty result set? While it's true that users should specify the correct data types, Spark SQL might handle this implicit type casting for a few reasons: Many people expect behavior similar to SQL systems like MySQL or Redshift, where implicit casting is allowed, making migrations easier and queries less error-prone. While we can’t align with all SQL systems, following common behavior improves predictability and usability. ### Take a string-to-integer comparison as an example: `**SELECT * FROM table WHERE string_col = 123;**` If string_col contains values like "123", most users would expect Spark SQL to match those rows, since systems like MySQL or Redshift allow this kind of conversion. Handling this case makes sense and aligns with user expectations, avoiding unnecessary query failures/empty results. ### Now, consider a string-to-boolean comparison: `**SELECT * FROM table WHERE string_col = true;**` This is where things get tricky. Should "true" match true? What about strings like "yes", "1", or "false"? ### A similar issue arises with a string-to-float comparison: `**SELECT * FROM table WHERE string_col = 5.5;**` Should "5.5" match 5.5? How about "05.5" or "5.500"? These variations make it unclear what should be considered a match, adding complexity and increasing the chances of unexpected outcomes. Imo, while handling straightforward cases like string-to-integer conversions is practical, trying to support more ambiguous comparisons like string-to-boolean or string-to-float could cause more confusion than it's worth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
