MaxGekk commented on code in PR #48642:
URL: https://github.com/apache/spark/pull/48642#discussion_r1816136572


##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##########
@@ -1434,6 +1435,42 @@ public static UTF8String[] icuSplitSQL(final UTF8String 
string, final UTF8String
     return strings.toArray(new UTF8String[0]);
   }
 
+  /**
+   * Splits the `string` into an array of substrings based on the `delimiter` 
regex, with respect
+   * to the maximum number of substrings `limit`.
+   *
+   * @param string the string to be split
+   * @param delimiter the delimiter regex to split the string
+   * @param limit the maximum number of substrings to return
+   * @return an array of substrings
+   */
+  public static UTF8String[] split(final UTF8String string, final UTF8String 
delimiter,
+                                      final int limit, final int collationId) {

Review Comment:
   fix indentations



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##########
@@ -1434,6 +1435,42 @@ public static UTF8String[] icuSplitSQL(final UTF8String 
string, final UTF8String
     return strings.toArray(new UTF8String[0]);
   }
 
+  /**
+   * Splits the `string` into an array of substrings based on the `delimiter` 
regex, with respect
+   * to the maximum number of substrings `limit`.
+   *
+   * @param string the string to be split
+   * @param delimiter the delimiter regex to split the string
+   * @param limit the maximum number of substrings to return
+   * @return an array of substrings
+   */
+  public static UTF8String[] split(final UTF8String string, final UTF8String 
delimiter,
+                                      final int limit, final int collationId) {
+    CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+    assert collation.isUtf8BinaryType || collation.isUtf8LcaseType :
+        "Unsupported collation type for split operation.";
+
+    if (CollationFactory.fetchCollation(collationId).isUtf8BinaryType) {
+      return string.split(delimiter, limit);
+    } else {
+      return lowercaseSplit(string, delimiter, limit);
+    }
+  }
+
+  public static UTF8String[] lowercaseSplit(final UTF8String string, final 
UTF8String delimiter,
+                                               final int limit) {

Review Comment:
   ditto: indentations



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##########
@@ -1434,6 +1435,42 @@ public static UTF8String[] icuSplitSQL(final UTF8String 
string, final UTF8String
     return strings.toArray(new UTF8String[0]);
   }
 
+  /**
+   * Splits the `string` into an array of substrings based on the `delimiter` 
regex, with respect
+   * to the maximum number of substrings `limit`.
+   *
+   * @param string the string to be split
+   * @param delimiter the delimiter regex to split the string
+   * @param limit the maximum number of substrings to return
+   * @return an array of substrings

Review Comment:
   Please, start the comments from upper cased letters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to