GitHub user KaiXinXiaoLei opened a pull request: https://github.com/apache/spark/pull/19951
[SPARK-22760][CORE][YARN] When sc.stop() is called, set stopped is true before removing executors ## What changes were proposed in this pull request? When the number of executors is big, and YarnSchedulerBackend.stop() is runningï¼ before YarnSchedulerBackend.stopped=true, if some executor is stoped, then YarnSchedulerBackend.onDisconnected() will be called. There is a problem as follows: {noformat} 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to shut down 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63. 17/12/12 15:34:45 ERROR Inbox: Ignoring error org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} . So i change the code, when removing executor, check sc.isStopped in YarnSchedulerBackend.onDisconnected(). if sc.isStopped=true, the message will not be sent. ## How was this patch tested? Run "spark-sql --master yarn -f query.sql" many times, the problem will be exists. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/KaiXinXiaoLei/spark pendingAdd11 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19951.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19951 ---- commit c4dcc19ce8af02f99be18db8ddfe9b704086dd43 Author: hanghang <584620...@qq.com> Date: 2017-12-11T23:53:52Z change code ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org