mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731891091



##########
File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the 
input
+  // binary string if it longer than `len` bytes, or a copy of the binary 
string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, 
Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary 
string
+  // is passed as the first argument and it is filled from position `firstPos` 
(inclusive)
+  // to position `beyondPos` (not inclusive).
+  protected static void fillWithPattern(byte[] result, int firstPos, int 
beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input binary string using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting 
binary string
+  // contains the first `len` bytes of the input if they exist, or is a copy 
of the input
+  // binary stringkm otherwise.
+  // For padding patterns with positive byte length, the resulting binary 
string's byte length is
+  // equal to `len`. If the input binary string is not less than `len` bytes, 
its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with 
the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {

Review comment:
       Yes. We adopt the same behavior as for character strings:
   ```
   scala> spark.sql("select lpad('abc', 10, '12')").show
   +-----------------+
   |lpad(abc, 10, 12)|
   +-----------------+
   |       1212121abc|
   +-----------------+
   
   
   scala> spark.sql("select rpad('abc', 10, '12')").show
   +-----------------+
   |rpad(abc, 10, 12)|
   +-----------------+
   |       abc1212121|
   +-----------------+
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to