ilicmarkodb commented on code in PR #54324:
URL: https://github.com/apache/spark/pull/54324#discussion_r2878447420


##########
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##########
@@ -190,3 +190,40 @@ case object NoConstraint extends StringConstraint
 case class FixedLength(length: Int) extends StringConstraint
 
 case class MaxLength(length: Int) extends StringConstraint
+
+/**
+ * Used in the context of UDFs when resolving parameters/return types.
+ *
+ * For example, if a UDF parameter is defined as `p1 STRING COLLATE 
UTF8_BINARY`,
+ * calling [[typeName]] will return just `STRING`, omitting the collation 
information.
+ * This causes the parameter to be parsed into the companion object 
[[StringType]]. If the
+ * UDF has a default collation specified, it will be applied to the companion 
object [[StringType]],
+ * potentially resulting in the construction of a [[StringType]] with an 
invalid collation.
+ */
+object ExplicitUTF8BinaryStringType
+  extends StringType(CollationFactory.UTF8_BINARY_COLLATION_ID, NoConstraint) {
+  override def typeName: String = s"string collate $collationName"
+  override def toString: String = s"StringType($collationName)"
+
+  /**
+   * Transforms the given `dataType` by replacing each [[StringType]] that has 
an explicit
+   * `UTF8_BINARY` collation with `ExplicitUTF8BinaryStringType`.
+   */

Review Comment:
   Let’s add this even though we currently don’t need it, but we will revert 
the change for `typeName` in `StringType` in the following PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to