mkaravel commented on a change in pull request #34056:
URL: https://github.com/apache/spark/pull/34056#discussion_r734631151
##########
File path:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -123,4 +124,216 @@ static long getPrefix(Object base, long offset, int
numBytes) {
}
return result;
}
+
+ // Constants used in the bitwiseAnd, bitwiseOr, and bitwiseXor methods
below. They
+ // represent valid (case insensitive) values for the third argument of these
methods.
+ private static final UTF8String LPAD_UTF8 = UTF8String.fromString("lpad");
+ private static final UTF8String RPAD_UTF8 = UTF8String.fromString("rpad");
+
+ // Return the bitwise AND of two byte sequences.
+ // This method is called when we call the BITAND SQL function. That function
has the following
+ // behavior:
+ // - If the byte lengths of the two sequences are equal, the result is a
byte sequence of the
+ // same length as the inputs and its content is the bitwise AND of the two
inputs.
+ // - If the byte lengths are different, we expect a third string argument
(constant) that
+ // indicates whether we should semantically pad (to the left or to the
right) the shorter
+ // input to match the length of the longer input before proceeding with
the bitwise AND
+ // operation. Padding in this case is done with zero bytes. Therefore, in
this case, the
+ // byte length of the result is equal to the maximum byte length of the
two inputs. The two
+ // acceptable values for the third argument are "lpad" and "rpad" (case
insensitive). If the
+ // value is "lpad" we pad the shorter byte sequence from the left with
zero bytes. If the
+ // value is "rpad" we pad the shorter byte sequence from the right with
zero bytes.
+ // The fourth argument of this method indicates the number of arguments on
the caller side (that
+ // is at the SQL function level). If the calling side used the two argument
overload of the BITAND
+ // SQL function, we expect the inputs to be of the same byte length. If the
calling side used the
+ // three argument overload of the BITAND SQL function, then we check that
the string constant has
+ // a valid value, and based on that value we do the appropriate semantic
padding.
+ public static byte[] bitwiseAnd(byte[] bytes1, byte[] bytes2, UTF8String
padding,
+ boolean isTwoArgs) {
+ if (bytes1 == null || bytes2 == null || padding == null) return null;
+ final int len1 = bytes1.length;
+ final int len2 = bytes2.length;
+ if (isTwoArgs && len1 != len2) {
+ throw new IllegalArgumentException("Two-argument BITAND cannot operate
on BINARY strings "
Review comment:
As we have discussed off github, the necessary classes are not visible
in `common/unsafe`.
Will follow up with this either in this PR or in a separate one once the
above is fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]