ilicmarkodb commented on code in PR #54324:
URL: https://github.com/apache/spark/pull/54324#discussion_r2878447420
##########
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##########
@@ -190,3 +190,40 @@ case object NoConstraint extends StringConstraint
case class FixedLength(length: Int) extends StringConstraint
case class MaxLength(length: Int) extends StringConstraint
+
+/**
+ * Used in the context of UDFs when resolving parameters/return types.
+ *
+ * For example, if a UDF parameter is defined as `p1 STRING COLLATE
UTF8_BINARY`,
+ * calling [[typeName]] will return just `STRING`, omitting the collation
information.
+ * This causes the parameter to be parsed into the companion object
[[StringType]]. If the
+ * UDF has a default collation specified, it will be applied to the companion
object [[StringType]],
+ * potentially resulting in the construction of a [[StringType]] with an
invalid collation.
+ */
+object ExplicitUTF8BinaryStringType
+ extends StringType(CollationFactory.UTF8_BINARY_COLLATION_ID, NoConstraint) {
+ override def typeName: String = s"string collate $collationName"
+ override def toString: String = s"StringType($collationName)"
+
+ /**
+ * Transforms the given `dataType` by replacing each [[StringType]] that has
an explicit
+ * `UTF8_BINARY` collation with `ExplicitUTF8BinaryStringType`.
+ */
Review Comment:
Let’s add this even though we currently don’t need it, but we will revert
the change for `typeName` in `StringType` in the following PRs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]