WangGuangxin commented on issue #26548: [SPARK-29918][SQL] 
RecordBinaryComparator should check endianness when compared by long
URL: https://github.com/apache/spark/pull/26548#issuecomment-554644280
 
 
   Maybe some words in my description make you misunderstood. In fact, **it has 
nothing to do with the mixing of big- and little-endian**.  
   
   **For two same records, the comparison results may be different after rerun. 
It is just the code logic bug. This can happened in a cluster with all x86-64 
and little-endian servers.**
   
   Take a real test data in my test in a cluster will **ALL x86-64 and 
little-endian servers** as an example. 
   
   In the first run, recordA's offset is `25617612`, length is `40`, recordB's 
offset is `53434324`, length is `40`. Since `25617612 % 8 == 4 && 53434324 % 8 
== 4`, so according the logic in `RecordBinaryComparator`,  it will compare 
first 4 bytes in byte-wise, and then compare the following 32 bytes by 3 Long. 
The last 4 bytes is compared by byte-wise again.
   
   In the second run, recordA's offset is `19257280`, length is `40`, recordB's 
offset is `16892896`, length is `40`.  Since `19257280 % 8 == 0 && 16892896 % 8 
== 0`, so it will compare the 40 bytes by 4 Long.
   
   Here comes the difference.  **For the last 4 bytes, compare them in a Long 
and compare them byte-wise is not equal in little-endian machine(explained in 
https://github.com/apache/spark/pull/26548#issuecomment-554554331).** The 
offset of record A and B is different in the two run, which will cause the 
comparison code path in `RecordBinaryComparator` different.
   
   @srowen @cloud-fan 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to