mihailom-db commented on PR #46474:
URL: https://github.com/apache/spark/pull/46474#issuecomment-2117193264
I would expect that behaviour, but the problem comes with this rewriting you
suggested. When we create expressions, CAST expression does not remember
identifier used during parsing, but only type that that identifier produced. I
managed to block SQL behaviour with the parsing rules, but pyspark, spark
connect and dataframe API pose a problem. They allow for casting with types
like StringType("collation_name") and StringType() and in python, as opposed to
scala, we cannot differentiate StringType() and StringType("UTF8_BINARY") and
protobuf always sends information as StringType("UTF8_BINARY"). Even if we only
block SQL syntax problem arises if someone uses dataframe api because we can
have session-level default collation changed and then collation resolution
rules would be impossible to resolve, as that cast would be translated to
cast(expression, StringType(session_level_default_collation)) and we have no
way of differentiating whether this was created by STRING identifier or STR
ING COLLATE session_level_default_collation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]