zhouyejoe opened a new pull request, #36305: URL: https://github.com/apache/spark/pull/36305
### What changes were proposed in this pull request? Adds the corruption exception handling for merged shuffle chunk when spark.shuffle.detectCorrupt is set to true(default value is true) ### Why are the changes needed? Prior to Spark 3.0, spark.shuffle.detectCorrupt is set to true by default, and this configuration is one of the knob for early corruption detection. So the fallback can be triggered as expected. After Spark 3.0, even though spark.shuffle.detectCorrupt is still set to true by default, but the early corruption detect knob is controlled with a new configuration spark.shuffle.detectCorrupt.useExtraMemory, and it set to false by default. Thus the default behavior, with only Magnet enabled after Spark 3.2.0(internal li-3.1.1), will disable the early corruption detection, thus no fallback will be triggered. And it will drop to throw an exception when start to read the corrupted blocks. We need to handle the corrupted stream for merged blocks with/out fallback in different scenarios: If user sets the spark.shuffle.detectCorrupt.useExtraMemory to true, this will trigger the fallback. But this block only puts a small portion of the shuffle block and evaluate whether it has been corrupted. There is still possibility that it will be corrupted in later parts of the shuffle blocks. Then it will be handled by the spark.shuffle.detectCorrupt. If the spark.shuffle.detectCorrupt.useExtraMemory is set to false, but spark.shuffle.detectCorrupt is set to true, it shouldn't throw an exception saying ShuffleChunk is not a shuffle block, and it should trigger the retry if the shuffleblock is shufflechunk. If spark.shuffle.detectCorrupt.useExtraMemory is set to false, and spark.shuffle.detectCorrupt is set to false, it should just throw an exception in the client side and fail the task. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test is WIP. UT to be added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
