[GitHub] [spark] Ngone51 commented on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

GitBox Fri, 23 Jul 2021 08:10:19 -0700


Ngone51 commented on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-885707103



   @otterc 
   
   > Though it avoids re-fetch of a corrupted block for which the cause of 
corruption is disk_issue, the act of finding the cause of corruption, which is 
by sending another message to the server, is as high as just retrying the 
corrupt block. 
   
   The main motivation behind the shuffle checksum project is to give the cause 
of data corruption to users/developers to help debug the underlying root causes 
further. It doesn't really try to bring performance improvement here. And 
please also note that diagnosis only happens for the corruption error (which is 
a corner case). So, it won't have a big impact on performance. 
   
   > I feel that this broad classification of corruption may not be that 
helpful to the user
   
   These are the only causes we can give under the current solution. And I 
think it's actually helpful. Without this change, people can only guess the 
cause. Even if we all suspect the most cause is due to disk issues, but no one 
can tell it for sure.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

Reply via email to