duanmeng opened a new pull request #29907:
URL: https://github.com/apache/spark/pull/29907


   The segment file may be empty even after writer committing caused by 
disk/kernel in heavy load cluster. Compare segment length with copied length to 
guard it.
   
   A data file might be empty even after  DiskBlockObjectWriter committing it 
in BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile 
used by MapStatus, and then cause data lost.
   
   This is related to disk/kernel but we can avoid it in spark without any 
performance loss. We can compare partitionWriterSegments[i].length with the 
length[i] after Utils.copyStream.
   
   I added some logs and reproduce the issue,
   
   The log when this issue happened
   ```
   20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 
   20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 
   20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
length: 0
   ```
    
   
   The peer log when this issue didn't happen
   ```
   20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: 
partitionWriterSegments[0]: 
(name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
   20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
   20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream 
length: 1462
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to