mkaravel commented on a change in pull request #34056:
URL: https://github.com/apache/spark/pull/34056#discussion_r734631151



##########
File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -123,4 +124,216 @@ static long getPrefix(Object base, long offset, int 
numBytes) {
     }
     return result;
   }
+
+  // Constants used in the bitwiseAnd, bitwiseOr, and bitwiseXor methods 
below. They
+  // represent valid (case insensitive) values for the third argument of these 
methods.
+  private static final UTF8String LPAD_UTF8 = UTF8String.fromString("lpad");
+  private static final UTF8String RPAD_UTF8 = UTF8String.fromString("rpad");
+
+  // Return the bitwise AND of two byte sequences.
+  // This method is called when we call the BITAND SQL function. That function 
has the following
+  // behavior:
+  // - If the byte lengths of the two sequences are equal, the result is a 
byte sequence of the
+  //   same length as the inputs and its content is the bitwise AND of the two 
inputs.
+  // - If the byte lengths are different, we expect a third string argument 
(constant) that
+  //   indicates whether we should semantically pad (to the left or to the 
right) the shorter
+  //   input to match the length of the longer input before proceeding with 
the bitwise AND
+  //   operation. Padding in this case is done with zero bytes. Therefore, in 
this case, the
+  //   byte length of the result is equal to the maximum byte length of the 
two inputs. The two
+  //   acceptable values for the third argument are "lpad" and "rpad" (case 
insensitive). If the
+  //   value is "lpad" we pad the shorter byte sequence from the left with 
zero bytes. If the
+  //   value is "rpad" we pad the shorter byte sequence from the right with 
zero bytes.
+  // The fourth argument of this method indicates the number of arguments on 
the caller side (that
+  // is at the SQL function level). If the calling side used the two argument 
overload of the BITAND
+  // SQL function, we expect the inputs to be of the same byte length. If the 
calling side used the
+  // three argument overload of the BITAND SQL function, then we check that 
the string constant has
+  // a valid value, and based on that value we do the appropriate semantic 
padding.
+  public static byte[] bitwiseAnd(byte[] bytes1, byte[] bytes2, UTF8String 
padding,
+                                  boolean isTwoArgs) {
+    if (bytes1 == null || bytes2 == null || padding == null) return null;
+    final int len1 = bytes1.length;
+    final int len2 = bytes2.length;
+    if (isTwoArgs && len1 != len2) {
+      throw new IllegalArgumentException("Two-argument BITAND cannot operate 
on BINARY strings "

Review comment:
       As we have discussed off github, the necessary classes are not visible 
in `common/unsafe`.
   Will follow up with this either in this PR or in a separate one once the 
above is fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to