MaxGekk commented on code in PR #48521:
URL: https://github.com/apache/spark/pull/48521#discussion_r1817769728


##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##########
@@ -1232,22 +1255,103 @@ public static UTF8String trimLeft(
     // Return the substring from the calculated position until the end of the 
string.
     return UTF8String.fromString(src.substring(charIndex));
   }
+  /**
+   * Trims the `srcString` string from the right side using the specified 
`trimString` characters,
+   * with respect to the UTF8_BINARY trim collation. For UTF8_BINARY trim 
collation, the method has
+   * one special case to cover with respect to trimRight function for regular 
UTF8_Binary collation.
+   * Trailing spaces should be ignored in case of trim collation (rtrim for 
example) and if
+   * trimString does not contain spaces. In this case, the method trims the 
string from the right
+   * and after call of regular trim functions returns back trimmed spaces as 
those should not get
+   * removed.
+   * @param srcString the input string to be trimmed from the right end of the 
string
+   * @param trimString the trim string characters to trim
+   * @param collationId the collation ID to use for string trim
+   * @return the trimmed string (for UTF8_LCASE collation)

Review Comment:
   and for non-UTF8_LCASE, what does it return?



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##########
@@ -1080,6 +1080,24 @@ public static UTF8String translate(final UTF8String 
input,
     return UTF8String.fromString(sb.toString());
   }
 
+  /**
+   * Trims the `srcString` string from both ends of the string using the 
specified `trimString`
+   * characters, with respect to the UTF8_BINARY trim collation. String 
trimming is performed by
+   * first trimming the left side of the string, and then trimming the right 
side of the string.
+   * The method returns the trimmed string. If the `trimString` is null, the 
method returns null.
+   *
+   * @param srcString the input string to be trimmed from both ends of the 
string
+   * @param trimString the trim string characters to trim
+   * @param collationId the collation ID to use for string trim
+   * @return the trimmed string (for UTF8_LCASE collation)

Review Comment:
   I didn't get why do you highlight `UTF8_LCASE`. Doesn't the function return 
a trimmed string for other collations like `UTF8_BINARY`?



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##########
@@ -539,26 +549,35 @@ public static UTF8String execBinary(
     }
     public static UTF8String execLowercase(
         final UTF8String srcString,
-        final UTF8String trimString) {
-      return CollationAwareUTF8String.lowercaseTrim(srcString, trimString);
+        final UTF8String trimString,
+        final int collationId) {
+      return CollationAwareUTF8String.lowercaseTrim(srcString, trimString, 
collationId);
     }
     public static UTF8String execICU(
         final UTF8String srcString,
         final UTF8String trimString,
         final int collationId) {
       return CollationAwareUTF8String.trim(srcString, trimString, collationId);
     }
+    public static UTF8String execBinaryTrim(
+            final UTF8String srcString,
+            final UTF8String trimString,
+            final int collationId) {

Review Comment:
   please, fix indentations here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to