stevomitric commented on code in PR #48121:
URL: https://github.com/apache/spark/pull/48121#discussion_r1762681433


##########
sql/api/src/main/scala/org/apache/spark/sql/internal/types/AbstractStringType.scala:
##########
@@ -51,3 +51,14 @@ case object StringTypeBinaryLcase extends AbstractStringType 
{
 case object StringTypeAnyCollation extends AbstractStringType {
   override private[sql] def acceptsType(other: DataType): Boolean = 
other.isInstanceOf[StringType]
 }
+
+/**
+ * Use StringTypeNonCSAICollation for expressions supporting all possible 
collation types
+ * except CS_AI collation types.
+ */
+case object StringTypeNonCSAICollation extends AbstractStringType {
+  override private[sql] def acceptsType(other: DataType): Boolean =
+    other.isInstanceOf[StringType] &&
+      (!other.asInstanceOf[StringType].typeName.contains("_AI") ||
+      other.asInstanceOf[StringType].typeName.contains("_CI"))

Review Comment:
   My thoughts here are that it might not be a nice way to check for 
sensitivity based of a collation name (?). Perhaps it would be more cleaner to 
extend the `StringType` object with `isNonCSAICollation` method (like we did 
with `isUTF8BinaryCollation` and `isUTF8LcaseCollation`.



##########
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala:
##########
@@ -1625,6 +1626,79 @@ class CollationSuite extends DatasourceV2SQLBase with 
AdaptiveSparkPlanHelper {
     }
   }
 
+  test("Expressions not supporting CS_AI collators") {
+    val unsupportedExpressions: Seq[Any] = Seq(
+      "ltrim",
+      "rtrim",
+      "trim",
+      "startswith",
+      "endswith",
+      "locate",
+      "instr",
+      "str_to_map",
+      "contains",
+      "replace",
+      ("translate", "efg"),
+      ("split_part", "2"),
+      ("substring_index", "2"))
+
+    val unsupportedCollator = "unicode_ai"
+    val supportedCollators: Seq[String] = Seq(
+      "unicode",
+      "unicode_ci",
+      "unicode_ci_ai"
+    )
+
+    unsupportedExpressions.foreach {
+      case expression: String =>
+        val analysisException = intercept[AnalysisException] {

Review Comment:
   nit: move all three bodies into a single function for reduced code 
duplication.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to