WangGuangxin commented on issue #26548: [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long URL: https://github.com/apache/spark/pull/26548#issuecomment-554644280 Maybe some words in my description make you misunderstood. In fact, **it has nothing to do with the mixing of big- and little-endian**. **For two same records, the comparison results may be different after rerun. It is just the code logic bug. This can happened in a cluster with all x86-64 and little-endian servers.** Take a real test data in my test in a cluster will **ALL x86-64 and little-endian servers** as an example. In the first run, recordA's offset is `25617612`, length is `40`, recordB's offset is `53434324`, length is `40`. Since `25617612 % 8 == 4 && 53434324 % 8 == 4`, so according the logic in `RecordBinaryComparator`, it will compare first 4 bytes in byte-wise, and then compare the following 32 bytes by 3 Long. The last 4 bytes is compared by byte-wise again. In the second run, recordA's offset is `19257280`, length is `40`, recordB's offset is `16892896`, length is `40`. Since `19257280 % 8 == 0 && 16892896 % 8 == 0`, so it will compare the 40 bytes by 4 Long. Here comes the difference. **For the last 4 bytes, compare them in a Long and compare them byte-wise is not equal in little-endian machine(explained in https://github.com/apache/spark/pull/26548#issuecomment-554554331).** The offset of record A and B is different in the two run, which will cause the comparison code path in `RecordBinaryComparator` different. @srowen @cloud-fan
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
