mundaym commented on a change in pull request #29762:
URL: https://github.com/apache/spark/pull/29762#discussion_r492370087
##########
File path:
common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java
##########
@@ -92,8 +96,10 @@ private static int hashBytesByInt(Object base, long offset,
int lengthInBytes, i
int h1 = seed;
for (int i = 0; i < lengthInBytes; i += 4) {
int halfWord = Platform.getInt(base, offset + i);
- int k1 = mixK1(halfWord);
- h1 = mixH1(h1, k1);
+ if (isBigEndian) {
Review comment:
I'm inclined to agree with @srowen that duplicating the loop is probably
unnecessary. Hotspot will almost certainly be able to remove the branch and
even if it doesn't this code isn't actually in the critical path for the hash
function (the critical path is through mixH1) so the latency of the extra check
can be hidden on an out of order processor. I'm happy to do it if the consensus
is that it is warranted though.
##########
File path:
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/XXH64Suite.java
##########
@@ -73,42 +72,22 @@ public void testKnownByteArrayInputs() {
hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 1));
Assert.assertEquals(0x739840CB819FA723L,
XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 1,
PRIME));
-
- if (ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN) {
- Assert.assertEquals(0x9256E58AA397AEF1L,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 4));
- Assert.assertEquals(0x9D5FFDFB928AB4BL,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 4,
PRIME));
- Assert.assertEquals(0xF74CB1451B32B8CFL,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 8));
- Assert.assertEquals(0x9C44B77FBCC302C5L,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 8,
PRIME));
- Assert.assertEquals(0xCFFA8DB881BC3A3DL,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 14));
- Assert.assertEquals(0x5B9611585EFCC9CBL,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 14,
PRIME));
- Assert.assertEquals(0x0EAB543384F878ADL,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET,
SIZE));
- Assert.assertEquals(0xCAA65939306F1E21L,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, SIZE,
PRIME));
- } else {
- Assert.assertEquals(0x7F875412350ADDDCL,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 4));
- Assert.assertEquals(0x564D279F524D8516L,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 4,
PRIME));
- Assert.assertEquals(0x7D9F07E27E0EB006L,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 8));
- Assert.assertEquals(0x893CEF564CB7858L,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 8,
PRIME));
- Assert.assertEquals(0xC6198C4C9CC49E17L,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 14));
- Assert.assertEquals(0x4E21BEF7164D4BBL,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 14,
PRIME));
- Assert.assertEquals(0xBCF5FAEDEE1F2B5AL,
- hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET,
SIZE));
- Assert.assertEquals(0x6F680C877A358FE5L,
- XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, SIZE,
PRIME));
- }
+ Assert.assertEquals(0x9256E58AA397AEF1L,
+ hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 4));
+ Assert.assertEquals(0x9D5FFDFB928AB4BL,
+ XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 4,
PRIME));
+ Assert.assertEquals(0xF74CB1451B32B8CFL,
+ hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 8));
+ Assert.assertEquals(0x9C44B77FBCC302C5L,
+ XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 8,
PRIME));
+ Assert.assertEquals(0xCFFA8DB881BC3A3DL,
+ hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 14));
+ Assert.assertEquals(0x5B9611585EFCC9CBL,
+ XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, 14,
PRIME));
+ Assert.assertEquals(0x0EAB543384F878ADL,
+ hasher.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, SIZE));
+ Assert.assertEquals(0xCAA65939306F1E21L,
+ XXH64.hashUnsafeBytes(BUFFER, Platform.BYTE_ARRAY_OFFSET, SIZE,
PRIME));
Review comment:
Done. I've tried to keep the style the same as the existing tests but I
can refactor if desired.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]