Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22338#discussion_r215202638
--- Diff:
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -69,22 +70,27 @@ public static int hashUnsafeWords(Object base, long
offset, int lengthInBytes, i
}
public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+ return hashUnsafeBytesBlock(base, Ints.checkedCast(base.size()), seed);
+ }
+
+ private static int hashUnsafeBytesBlock(MemoryBlock base, int
lengthInBytes, int seed) {
// This is not compatible with original and another implementations.
// But remain it for backward compatibility for the components
existing before 2.3.
- int lengthInBytes = Ints.checkedCast(base.size());
assert (lengthInBytes >= 0): "lengthInBytes cannot be negative";
int lengthAligned = lengthInBytes - lengthInBytes % 4;
- int h1 = hashBytesByIntBlock(base.subBlock(0, lengthAligned), seed);
+ int h1 = hashBytesByIntBlock(base, lengthAligned, seed);
+ long offset = base.getBaseOffset();
+ Object o = base.getBaseObject();
for (int i = lengthAligned; i < lengthInBytes; i++) {
- int halfWord = base.getByte(i);
+ int halfWord = Platform.getByte(o, offset + i);
--- End diff --
that was my guess too at the beginning, but if you just do this change,
performance won't change. Seems reasonable what said by @kiszk about the clue
being the size of the javabyte code generated, but needs more investigation.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]