Re: [PR] [SPARK-47972][SQL] Restrict CAST expression for collations [spark]

via GitHub Fri, 17 May 2024 02:57:59 -0700


mihailom-db commented on PR #46474:
URL: https://github.com/apache/spark/pull/46474#issuecomment-2117193264


   I would expect that behaviour, but the problem comes with this rewriting you 
suggested. When we create expressions, CAST expression does not remember 
identifier used during parsing, but only type that that identifier produced. I 
managed to block SQL behaviour with the parsing rules, but pyspark, spark 
connect and dataframe API pose a problem. They allow for casting with types 
like StringType("collation_name") and StringType() and in python, as opposed to 
scala, we cannot differentiate StringType() and StringType("UTF8_BINARY") and 
protobuf always sends information as StringType("UTF8_BINARY"). Even if we only 
block SQL syntax problem arises if someone uses dataframe api because we can 
have session-level default collation changed and then collation resolution 
rules would be impossible to resolve, as that cast would be translated to 
cast(expression, StringType(session_level_default_collation)) and we have no 
way of differentiating whether this was created by STRING identifier or STR
 ING COLLATE session_level_default_collation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47972][SQL] Restrict CAST expression for collations [spark]

Reply via email to