GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22233

    [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITIONS

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to not perform recursive parallel listening of files 
in the `scanPartitions` method because it can cause a deadlock. Instead of that 
I propose to do `scanPartitions` in parallel for top level partitions only.
    
    ## How was this patch tested?
    
    I extended an existing test to trigger the deadlock.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 fix-recover-partitions

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22233
    
----
commit 189e14d98d9317acc41b46e5c6a4fe1f867174d3
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-08-25T12:36:57Z

    Extending tests to catch a dead-lock

commit bc660a09faed9472c0492ac33c183f6bf5248048
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-08-25T12:43:35Z

    Enable the tests which catches a dead-lock

commit 59a376d324a6cfbaffba40e7b14e48023513eeac
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-08-25T13:19:42Z

    List file in parallel on top level only

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to