Ngone51 commented on a change in pull request #28746:
URL: https://github.com/apache/spark/pull/28746#discussion_r437100175
##########
File path: core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
##########
@@ -74,6 +74,10 @@ class LocalSparkCluster(
def stop(): Unit = {
logInfo("Shutting down local Spark cluster.")
+ // SPARK-31922: wait one more second before shutting down rpcEnvs of
master and worker,
+ // in order to let the cluster have time to handle the
`UnregisterApplication` message.
+ // Otherwise, we could hit "RpcEnv already stopped" error.
+ Thread.sleep(1000)
Review comment:
Hi @gerashegalov , thank you for the PR. But I think the PR might not
work. Sync from AppClient to Master is not enough. We should also sync the
events from Master to Worker after the Master receives the
`UnregisterApplication` message. In `Master. finishApplication`, you can see
that the Master not only reply to AppClient, but also send messages to Worker
in an async way. And the error happens exactly on the Worker. But if we do in
this way, I'm afraid we add too much special code logic for local cluster only,
which I personally think can be overhead.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]