WangGuangxin commented on issue #26548: [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long URL: https://github.com/apache/spark/pull/26548#issuecomment-554687131 > Yes, it isn't really related to endian-ness, although to fix this issue you have to know whether the bytes were read as little-endian or not. > > Here's a simple proof of concept: > > ``` > long arrayOffset = 12; > > long[] arr1 = new long[2]; > // 12 + 4 = 16, aligned > Platform.putLong(arr1, arrayOffset + 4, 0xa000000000000000L); > long[] arr2 = new long[2]; > Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L); > System.out.println(binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8)); > > long[] arr3 = new long[2]; > Platform.putLong(arr3, arrayOffset, 0xa000000000000000L); > long[] arr4 = new long[2]; > Platform.putLong(arr4, arrayOffset, 0x0000000000000000L); > System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8)); > ``` > > (Edited the example to correctly show the problem with signed vs unsigned long comparison) > > This prints -1 and 1, but it's the same 8 bytes being compared with each other, even at the same alignment. In the first one, it's aligned (12 + 4 = 16), so starts by comparing longs. The first one is negative when read as a signed long, so compares before the second, 0. However when not aligned in the second case, it starts with byte-by-byte comparison, and the first 0xa0 is read as a unsigned byte greater than 0, so it's larger. (Note here the endian-ness issue cancels out, so to speak.) > > This example is actually illustrating the other issue mentioned here, about unsigned long comparison. But I think this example shows the issue you reported: > > ``` > long arrayOffset = 12; > > long[] arr1 = new long[2]; > Platform.putLong(arr1, arrayOffset, 0x0100000000000000L); > long[] arr2 = new long[2]; > Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000001L); > System.out.println(binaryComparator.compare(arr1, arrayOffset, 8, arr2, arrayOffset + 4, 8)); > > long[] arr3 = new long[2]; > Platform.putLong(arr3, arrayOffset, 0x0100000000000000L); > long[] arr4 = new long[2]; > Platform.putLong(arr4, arrayOffset, 0x0000000000000001L); > System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8)); > ``` > > Prints 1 and -1. > Here, in the first case it again starts with long comparisons as the architecture supports unaligned comparison. Both are positive but the long in the first case is larger. In the second case, they're not aligned on a 8-byte boundary so starts with byte comparison. But the first byte it looks at is the least-significant byte of the longs, and so compares 0 to 1 and finds the second is larger. > > And yes this is just running locally on one machine even, Intel 64-bit (which supports unaligned access). > > The second one is fixed by a change like the one in this PR. > The first one is fixed by also comparing the longs then with Long.compareUnsigned. They agree in both cases if both changes are made. Nice explanation. The second example is just what I mean.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
