mihailom-db commented on code in PR #45856: URL: https://github.com/apache/spark/pull/45856#discussion_r1555843730
########## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ########## @@ -1078,7 +1078,22 @@ public UTF8String[] split(UTF8String pattern, int limit) { } return result; } - return split(pattern.toString(), limit); + return split(pattern.toString(), limit, regexFlags); + } + + public UTF8String[] split(UTF8String pattern, int limit) { + return split(pattern, limit, 0); // Pattern without regex flags + } + + public UTF8String[] splitCollationAware(UTF8String pattern, int limit, int collationId) { + if (CollationFactory.fetchCollation(collationId).supportsBinaryEquality) { + return split(pattern, limit); + } + if (collationId == CollationFactory.UTF8_BINARY_LCASE_COLLATION_ID) { + return split(pattern, limit, Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE); + } + throw new UnsupportedOperationException("Unsupported collation " + + CollationFactory.fetchCollation(collationId).collationName); Review Comment: +1, this should fail in inputTypeCheck of the related expression. I am not sure if we need this double guard. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org