GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/20480
[Spark-23306] Fix the oom caused by contention
## What changes were proposed in this pull request?
here is race condition in TaskMemoryManger, which may cause OOM.
The memory released may be taken by another task because there is a gap
between releaseMemory and acquireMemory, e.g., UnifiedMemoryManager, causing
the OOM. if the current is the only one that can perform spill. It can happen
to BytesToBytesMap, as it only spill required bytes.
Loop on current consumer if it still has memory to release.
## How was this patch tested?
The race contention is hard to reproduce, but the current logic seems
causing the issue.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhzhan/spark oom
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20480.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20480
commit df96f0c126833b0e812cd715ae1538dbd38afac4
Author: Zhan Zhang
Date: 2018-01-12T19:51:19Z
fix the oom caused by contention
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org