GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22283
[SPARK-25283][CORE] Fix for a deadlock in UnionRDD
## What changes were proposed in this pull request?
The commit
https://github.com/apache/spark/commit/131ca146ed390cd0109cd6e8c95b61e418507080
replaced Scala parallel collections in `UnionRDD` by `parmap`. The changes
cause a deadlock in the `partitions` method if the method is called recursively
and number of unions of the top level union is bigger than size of fixed thread
pool used in `UnionRDD`. In the PR, I propose to revert Scala parallel
collections back since they support nested calls even on fixed thread pools.
## How was this patch tested?
I added a test which creates 2 levels of unionRDDs wider than fixed thread
pool.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 deadlock-unionrdd
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22283.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22283
----
commit ee8c91ac7de8c2aad66def843d8a9ee0a199d8d1
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-30T12:11:39Z
Test reproduces a deadlock
commit afc2f73101a331d80104ce140c4c8288f6a764b3
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-30T12:13:19Z
Revert "Porting UnionRDD on parmap"
This reverts commit 72cdfeb765cda13ab03ed8515a83fa24657894ac.
commit 8c38ab506ec5444f150ea6a46a5afe067c012f0f
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-30T12:21:24Z
Test reproduces the deadlock in UnionRDD
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]