GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/8326

    [SPARK-10125][Streaming]Fix a potential deadlock in JobGenerator.stop

    Because `lazy val` uses `this` lock, if JobGenerator.stop and 
JobGenerator.doCheckpoint (JobGenerator.shouldCheckpoint has not yet been 
initialized) run at the same time, it may hang.
    
    Here are the stack traces for the deadlock:
    
    ```Java
    "pool-1-thread-1-ScalaTest-running-StreamingListenerSuite" #11 prio=5 
os_prio=31 tid=0x00007fd35d094800 nid=0x5703 in Object.wait() 
[0x000000012ecaf000]
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            at java.lang.Thread.join(Thread.java:1245)
            - locked <0x00000007b5d8d7f8> (a 
org.apache.spark.util.EventLoop$$anon$1)
            at java.lang.Thread.join(Thread.java:1319)
            at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81)
            at 
org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:155)
            - locked <0x00000007b5d8cea0> (a 
org.apache.spark.streaming.scheduler.JobGenerator)
            at 
org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:95)
            - locked <0x00000007b5d8ced8> (a 
org.apache.spark.streaming.scheduler.JobScheduler)
            at 
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:687)
    
    "JobGenerator" #67 daemon prio=5 os_prio=31 tid=0x00007fd35c3b9800 
nid=0x9f03 waiting for monitor entry [0x0000000139e4a000]
       java.lang.Thread.State: BLOCKED (on object monitor)
            at 
org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint$lzycompute(JobGenerator.scala:63)
            - waiting to lock <0x00000007b5d8cea0> (a 
org.apache.spark.streaming.scheduler.JobGenerator)
            at 
org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint(JobGenerator.scala:63)
            at 
org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:290)
            at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182)
            at 
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:83)
            at 
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:82)
            at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    ```
    
    I can use this patch to produce this deadlock: 
https://github.com/zsxwing/spark/commit/8a88f28d1331003a65fabef48ae3d22a7c21f05f
    
    And a timeout build in Jenkins due to this deadlock: 
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1654/
    
    This PR initializes `checkpointWriter` before `eventLoop` uses it to avoid 
this deadlock.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-10125

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8326.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8326
    
----
commit 7a3799b52737baf2a062a52b76e2552a8fb989c9
Author: zsxwing <[email protected]>
Date:   2015-08-20T01:18:34Z

    Fix a potential deadlock in JobGenerator.stop

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to