GitHub user KaiXinXiaoLei opened a pull request:

    https://github.com/apache/spark/pull/19951

    [SPARK-22760][CORE][YARN] When sc.stop() is called, set stopped is true 
before removing executors

    ## What changes were proposed in this pull request?
    
    When the number of executors is big, and YarnSchedulerBackend.stop() is 
running,
    before  YarnSchedulerBackend.stopped=true,  if some executor is stoped, 
then YarnSchedulerBackend.onDisconnected() will be called. There is a problem 
as follows: 
    {noformat}
    17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to 
shut down
    17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63.
    17/12/12 15:34:45 ERROR Inbox: Ignoring error
    org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or 
it has been stopped.
        at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
        at 
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133)
        at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
        at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301)
        at 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
        at 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
    {noformat}
    
    . So i change the code, when removing executor, check sc.isStopped in 
YarnSchedulerBackend.onDisconnected(). if sc.isStopped=true, the message will 
not be sent.
    
    ## How was this patch tested?
    Run "spark-sql --master yarn -f query.sql" many times, the problem will be 
exists.
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KaiXinXiaoLei/spark pendingAdd11

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19951
    
----
commit c4dcc19ce8af02f99be18db8ddfe9b704086dd43
Author: hanghang <584620...@qq.com>
Date:   2017-12-11T23:53:52Z

    change code

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to