duanmeng commented on pull request #29907:
URL: https://github.com/apache/spark/pull/29907#issuecomment-741618107


   > In theory, with disk, driver or kernal bugs, any write can silently have 
issues.
   Validating every write from spark is not always practical.
   
   Yes you're right, so I want to reuse the segment's length to do a 
lightweight check, which has been naturally done by record the count of byte 
when writing in both [LocalDiskShuffleMapOutputWriter used by 
SortShuffleWriter](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java#L244)
 and [ShuffleExternalSorter used by 
UnsafeShuffleWriter](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java#L204).
 
   
   >Is there any mitigating factors, env where it is more commonly triggered, 
how to reproduce it ?
   
   We once had a issue caused by a disk check script (use smartctl) that 
corrupt the data. However the issue described in this PR should be related to 
other env factor and is rarely reproduced (Honestly I don't know to reproduced 
or trigger), and disappeared after I applied this patch.
   
   Thanks @mridulm .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to