mihailom-db commented on code in PR #45856:
URL: https://github.com/apache/spark/pull/45856#discussion_r1555843730


##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1078,7 +1078,22 @@ public UTF8String[] split(UTF8String pattern, int limit) 
{
       }
       return result;
     }
-    return split(pattern.toString(), limit);
+    return split(pattern.toString(), limit, regexFlags);
+  }
+
+  public UTF8String[] split(UTF8String pattern, int limit) {
+    return split(pattern, limit, 0); // Pattern without regex flags
+  }
+
+  public UTF8String[] splitCollationAware(UTF8String pattern, int limit, int 
collationId) {
+    if (CollationFactory.fetchCollation(collationId).supportsBinaryEquality) {
+      return split(pattern, limit);
+    }
+    if (collationId == CollationFactory.UTF8_BINARY_LCASE_COLLATION_ID) {
+      return split(pattern, limit, Pattern.UNICODE_CASE | 
Pattern.CASE_INSENSITIVE);
+    }
+    throw new UnsupportedOperationException("Unsupported collation " +
+      CollationFactory.fetchCollation(collationId).collationName);

Review Comment:
   +1, this should fail in inputTypeCheck of the related expression. I am not 
sure if we need this double guard.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to