dejankrak-db commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1952921268
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##########
@@ -155,22 +123,22 @@ object ResolveDefaultStringTypes extends
Rule[LogicalPlan] {
dataType.existsRecursively(isDefaultStringType)
private def isDefaultStringType(dataType: DataType): Boolean = {
+ // STRING (without explicit collation) is considered default string type.
+ // STRING COLLATE <collation_name> (with explicit collation) is not
considered
+ // default string type even when explicit collation is UTF8_BINARY
(default collation).
dataType match {
- case st: StringType =>
- // should only return true for StringType object and not
StringType("UTF8_BINARY")
- st.eq(StringType) || st.isInstanceOf[TemporaryStringType]
+ // should only return true for StringType object and not for
StringType("UTF8_BINARY")
+ case st: StringType => st.eq(StringType)
case _ => false
}
}
private def replaceDefaultStringType(dataType: DataType, newType:
StringType): DataType = {
+ // Should replace STRING with the new type.
+ // Should not replace STRING COLLATE UTF8_BINARY, as that is explicit
collation.
dataType.transformRecursively {
case currentType: StringType if isDefaultStringType(currentType) =>
- if (currentType == newType) {
- TemporaryStringType()
- } else {
- newType
- }
+ newType
Review Comment:
We don't need RuleExecutor.forceAdditionalIteration anymore, so I have
removed it altogether from the code, per other comment as well.
If newType is StringType(UTF8_BINARY), that won't be an issue, as the only
potential candidates for replacement are default string types, i.e. StringType,
whose collation is UTF8_BINARY by default. Hence, even if we skip them at that
point, their collation would remain accurate.
Now, if at some point later the object level collation gets changed to
non-UTF8_BINARY collation (e.g. ALTER TABLE foo DEFAULT COLLATION UNICODE),
this will only apply to the columns added from that point onwards, whereas the
existing columns remain unaffected (per ref spec), i.e. their collation has
already previously been stamped, and the only way to change it would be through
ALTER TABLE foo ALTER COLUMN c1 STRING COLLATE UNICODE, which is handled
separately though existing column-level collation logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]