srowen commented on a change in pull request #34267:
URL: https://github.com/apache/spark/pull/34267#discussion_r729953775
##########
File path:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -42,15 +45,48 @@ public static void writeToMemory(byte[] src, Object target,
long targetOffset) {
public static long getPrefix(byte[] bytes) {
if (bytes == null) {
return 0L;
+ }
+ return getPrefix(bytes, Platform.BYTE_ARRAY_OFFSET, bytes.length);
+ }
+
+ public static long getPrefix(Object base, long offset, int numBytes) {
+ // Since JVMs are either 4-byte aligned or 8-byte aligned, we check the
size of the bytes.
+ // If size is 0, just return 0.
+ // If size is between 0 and 4 (inclusive), assume data is 4-byte aligned
under the hood and
+ // use a getInt to fetch the prefix.
+ // If size is greater than 4, assume we have at least 8 bytes of data to
fetch.
+ // After getting the data, we use a mask to mask out data that is not part
of the bytes.
+ long p;
+ long mask = 0;
+ if (IS_LITTLE_ENDIAN) {
+ if (numBytes >= 8) {
+ p = Platform.getLong(base, offset);
+ } else if (numBytes > 4) {
+ p = Platform.getLong(base, offset);
+ mask = (1L << (8 - numBytes) * 8) - 1;
+ } else if (numBytes > 0) {
+ p = (long) Platform.getInt(base, offset);
+ mask = (1L << (8 - numBytes) * 8) - 1;
+ } else {
+ p = 0;
+ }
+ p = java.lang.Long.reverseBytes(p);
} else {
- final int minLen = Math.min(bytes.length, 8);
- long p = 0;
- for (int i = 0; i < minLen; ++i) {
- p |= ((long) Platform.getByte(bytes, Platform.BYTE_ARRAY_OFFSET + i) &
0xff)
Review comment:
Ah right, nevermind, I'm not thinking this through. Endianness matters
when reading ints or longs and masking
##########
File path:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -42,15 +45,34 @@ public static void writeToMemory(byte[] src, Object target,
long targetOffset) {
public static long getPrefix(byte[] bytes) {
if (bytes == null) {
return 0L;
+ }
+ return getPrefix(bytes, Platform.BYTE_ARRAY_OFFSET, bytes.length);
+ }
+
+ protected static long getPrefix(Object base, long offset, int numBytes) {
Review comment:
This can be package private (remove protected) but it doesn't matter -
it's a final class anyway
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]