Re: [PR] [SPARK-50101][SQL] Fix collated behavior of StringToMap expression [spark]

via GitHub Fri, 25 Oct 2024 06:58:45 -0700


uros-db commented on code in PR #48642:
URL: https://github.com/apache/spark/pull/48642#discussion_r1816728135



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##########
@@ -1434,6 +1435,42 @@ public static UTF8String[] icuSplitSQL(final UTF8String 
string, final UTF8String
     return strings.toArray(new UTF8String[0]);
   }
 
+  /**
+   * Splits the `string` into an array of substrings based on the `delimiter` 
regex, with respect
+   * to the maximum number of substrings `limit`.
+   *
+   * @param string the string to be split
+   * @param delimiter the delimiter regex to split the string
+   * @param limit the maximum number of substrings to return
+   * @return an array of substrings
+   */
+  public static UTF8String[] split(final UTF8String string, final UTF8String 
delimiter,
+      final int limit, final int collationId) {
+    CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+    assert collation.isUtf8BinaryType || collation.isUtf8LcaseType :
+        "Unsupported collation type for split operation.";
+
+    if (CollationFactory.fetchCollation(collationId).isUtf8BinaryType) {
+      return string.split(delimiter, limit);
+    } else {
+      return lowercaseSplit(string, delimiter, limit);
+    }

Review Comment:
   branching execution based on collation is not something that should be done 
in `CollationAwareUTF8String`
   
   please see: `CollationSupport.java` and follow the implementation pattern to 
introduce a new class for `StringToMap` if necessary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50101][SQL] Fix collated behavior of StringToMap expression [spark]

Reply via email to