uros-db commented on code in PR #45856:
URL: https://github.com/apache/spark/pull/45856#discussion_r1552929298


##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1078,7 +1078,22 @@ public UTF8String[] split(UTF8String pattern, int limit) 
{
       }
       return result;
     }
-    return split(pattern.toString(), limit);
+    return split(pattern.toString(), limit, regexFlags);
+  }
+
+  public UTF8String[] split(UTF8String pattern, int limit) {
+    return split(pattern, limit, 0); // Pattern without regex flags
+  }
+
+  public UTF8String[] splitCollationAware(UTF8String pattern, int limit, int 
collationId) {
+    if (CollationFactory.fetchCollation(collationId).supportsBinaryEquality) {
+      return split(pattern, limit);
+    }
+    if (collationId == CollationFactory.UTF8_BINARY_LCASE_COLLATION_ID) {
+      return split(pattern, limit, Pattern.UNICODE_CASE | 
Pattern.CASE_INSENSITIVE);
+    }
+    throw new UnsupportedOperationException("Unsupported collation " +
+      CollationFactory.fetchCollation(collationId).collationName);

Review Comment:
   if you are using `StringTypeBinaryLcase` for this expression, it should 
never come to this point
   that said, we may want to keep this here anyway because this method is part 
of a public interface



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to