duanmeng commented on pull request #29907: URL: https://github.com/apache/spark/pull/29907#issuecomment-741618107
> In theory, with disk, driver or kernal bugs, any write can silently have issues. Validating every write from spark is not always practical. Yes you're right, so I want to reuse the segment's length to do a lightweight check, which has been naturally done by record the count of byte when writing in both [LocalDiskShuffleMapOutputWriter used by SortShuffleWriter](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java#L244) and [ShuffleExternalSorter used by UnsafeShuffleWriter](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java#L204). >Is there any mitigating factors, env where it is more commonly triggered, how to reproduce it ? We once had a issue caused by a disk check script (use smartctl) that corrupt the data. However the issue described in this PR should be related to other env factor and is rarely reproduced (Honestly I don't know to reproduced or trigger), and disappeared after I applied this patch. Thanks @mridulm . ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
