GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22233
[SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITIONS
## What changes were proposed in this pull request?
In the PR, I propose to not perform recursive parallel listening of files
in the `scanPartitions` method because it can cause a deadlock. Instead of that
I propose to do `scanPartitions` in parallel for top level partitions only.
## How was this patch tested?
I extended an existing test to trigger the deadlock.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 fix-recover-partitions
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22233.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22233
----
commit 189e14d98d9317acc41b46e5c6a4fe1f867174d3
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-25T12:36:57Z
Extending tests to catch a dead-lock
commit bc660a09faed9472c0492ac33c183f6bf5248048
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-25T12:43:35Z
Enable the tests which catches a dead-lock
commit 59a376d324a6cfbaffba40e7b14e48023513eeac
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-25T13:19:42Z
List file in parallel on top level only
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]