mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731891091
##########
File path:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
}
return result;
}
+
+ // Helper method for implementing `lpad` and `rpad`.
+ // If the padding pattern's length is 0, return the first `len` bytes of the
input
+ // binary string if it longer than `len` bytes, or a copy of the binary
string, otherwise.
+ protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+ len = Math.min(bytes.length, len);
+ final byte[] result = new byte[len];
+ Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result,
Platform.BYTE_ARRAY_OFFSET, len);
+ return result;
+ }
+
+ // Helper method for implementing `lpad` and `rpad`.
+ // Fills the resulting binary string with the pattern. The resulting binary
string
+ // is passed as the first argument and it is filled from position `firstPos`
(inclusive)
+ // to position `beyondPos` (not inclusive).
+ protected static void fillWithPattern(byte[] result, int firstPos, int
beyondPos, byte[] pad) {
+ for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+ final int jMax = Math.min(pad.length, beyondPos - pos);
+ for (int j = 0; j < jMax; ++j) {
+ result[pos + j] = (byte) pad[j];
+ }
+ }
+ }
+
+ // Left-pads the input binary string using the provided padding pattern.
+ // In the special case that the padding pattern is empty, the resulting
binary string
+ // contains the first `len` bytes of the input if they exist, or is a copy
of the input
+ // binary stringkm otherwise.
+ // For padding patterns with positive byte length, the resulting binary
string's byte length is
+ // equal to `len`. If the input binary string is not less than `len` bytes,
its first `len` bytes
+ // are returned. Otherwise, the remaining missing bytes are filled in with
the provided pattern.
+ public static byte[] lpad(byte[] bytes, int len, byte[] pad) {
Review comment:
Yes. We adopt the same behavior as for character strings:
```
scala> spark.sql("select lpad('abc', 10, '12')").show
+-----------------+
|lpad(abc, 10, 12)|
+-----------------+
| 1212121abc|
+-----------------+
scala> spark.sql("select rpad('abc', 10, '12')").show
+-----------------+
|rpad(abc, 10, 12)|
+-----------------+
| abc1212121|
+-----------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]