GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/20685
[SPARK-23524] Big local shuffle blocks should not be checked for corruption.
## What changes were proposed in this pull request?
In current code, all local blocks will be checked for corruption no matter
it's big or not. The reasons are as below:
Size in FetchResult for local block is set to be 0
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L327)
SPARK-4105 meant to only check the small blocks(size<maxBytesInFlight/3),
but for reason 1, below check will be invalid.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L420
We can fix this and avoid the OOM.
## How was this patch tested?
UT added
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinxing64/spark SPARK-23524
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20685.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20685
----
commit 535916c045b123e803c0f6dbf786076045036167
Author: jx158167 <jx158167@...>
Date: 2018-02-27T09:56:38Z
[SPARK-23524] Big local shuffle blocks should not be checked for corruption.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]