[GitHub] ankuriitg opened a new pull request #23453: [SPARK-26089][CORE] Handle corruption in large shuffle blocks

GitBox Fri, 04 Jan 2019 09:35:33 -0800

ankuriitg opened a new pull request #23453: [SPARK-26089][CORE] Handle 
corruption in large shuffle blocks
URL: https://github.com/apache/spark/pull/23453
 
 
   
   ## What changes were proposed in this pull request?
   
   SPARK-4105 added corruption detection in shuffle blocks but that was limited 
to blocks which are
   smaller than maxBytesInFlight/3. This commit adds upon that by adding 
corruption check for large
   blocks. There are two changes/improvements that are made in this commit:
   
   1. Large blocks are checked upto maxBytesInFlight/3 size in a similar way as 
smaller blocks, so if a
   large block is corrupt in the starting, that block will be re-fetched and if 
that also fails,
   FetchFailureException will be thrown.
   2. If large blocks are corrupt after size maxBytesInFlight/3, then any 
IOException thrown while
   reading the stream will be converted to FetchFailureException.  This is 
slightly more aggressive
   than was originally intended but since the consumer of the stream may have 
already read some records and processed them, we can't just re-fetch the block, 
we need to fail the whole task. Additionally, we also thought about maybe 
adding a new type of TaskEndReason, which would re-try the task couple of times 
before failing the previous stage, but given the complexity involved in that 
solution we decided to not proceed in that direction.
   
   Thanks to @squito for direction and support.
   
   ## How was this patch tested?
   
   Changed the junit test for big blocks to check for corruption.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] ankuriitg opened a new pull request #23453: [SPARK-26089][CORE] Handle corruption in large shuffle blocks

Reply via email to