GitHub user gaborgsomogyi opened a pull request:

    https://github.com/apache/spark/pull/21214

    [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

    ## What changes were proposed in this pull request?
    
    DataFrameRangeSuite.test("Cancelling stage in a query with Range.") stays 
sometimes in an infinite loop and times out the build.
    
    There were multiple issues with the test:
    
    1. The first valid stageId is zero when the test started alone and not in a 
suite and the following code waits until timeout:
    
    ```
    eventually(timeout(10.seconds), interval(1.millis)) {
      assert(DataFrameRangeSuite.stageToKill > 0)
    }
    ```
    
    2. The `DataFrameRangeSuite.stageToKill` was overwritten by the task's 
thread after the reset which ended up in canceling the same stage 2 times. This 
caused the infinite wait.
    
    This PR solves this mentioned flakyness by removing the shared 
`DataFrameRangeSuite.stageToKill` and using `onTaskStart` where stage ID is 
provided. In order to make sure cancelStage called for all stages 
`waitUntilEmpty` is called on `ListenerBus`.
    
    In [PR20888](https://github.com/apache/spark/pull/20888) this tried to get 
solved by:
    * Stopping the executor thread with `wait`
    * Wait for all `cancelStage` called
    * Kill the executor thread by setting 
`SparkContext.SPARK_JOB_INTERRUPT_ON_CANCEL`
    
    but this thread killing left the shared `SparkContext` sometimes in a state 
where further tasks can't be submitted. As a result 
DataFrameRangeSuite.test("Cancelling stage in a query with Range.") test passed 
properly but the next test inside the suite was hanging.
    
    ## How was this patch tested?
    
    Existing unit test executed 10k times.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborgsomogyi/spark SPARK-23775_1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21214
    
----
commit 9781cbee95f338d5e1bcd61190c7a938155803bf
Author: Gabor Somogyi <gabor.g.somogyi@...>
Date:   2018-05-02T09:23:38Z

    [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to