zhouyejoe opened a new pull request, #36305:
URL: https://github.com/apache/spark/pull/36305

   
   ### What changes were proposed in this pull request?
   Adds the corruption exception handling for merged shuffle chunk when 
spark.shuffle.detectCorrupt is set to true(default value is true)
   
   ### Why are the changes needed?
   Prior to Spark 3.0,  spark.shuffle.detectCorrupt is set to true by default,  
and this configuration is one of the knob for early corruption detection. So 
the fallback can be triggered as expected.
   
   After Spark 3.0, even though spark.shuffle.detectCorrupt is still set to 
true by default, but the early corruption detect knob is controlled with a new 
configuration  spark.shuffle.detectCorrupt.useExtraMemory, and it set to false 
by default. Thus the default behavior, with only Magnet enabled after Spark 
3.2.0(internal li-3.1.1), will disable the early corruption detection, thus no 
fallback will be triggered. And it will drop to throw an exception when start 
to read the corrupted blocks.
   
   We need to handle the corrupted stream for merged blocks with/out fallback 
in different scenarios:
   
   If user sets the spark.shuffle.detectCorrupt.useExtraMemory to true, this 
will trigger the fallback. But this block only puts a small portion of the 
shuffle block and evaluate whether it has been corrupted. There is still 
possibility that it will be corrupted in later parts of the shuffle blocks. 
Then it will be handled by the spark.shuffle.detectCorrupt.
   If the spark.shuffle.detectCorrupt.useExtraMemory is set to false, but 
spark.shuffle.detectCorrupt is set to true, it shouldn't throw an exception 
saying ShuffleChunk is not a shuffle block, and it should trigger the retry if 
the shuffleblock is shufflechunk.
   If spark.shuffle.detectCorrupt.useExtraMemory is set to false, and 
spark.shuffle.detectCorrupt is set to false, it should just throw an exception 
in the client side and fail the task.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Test is WIP. UT to be added
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to