GitHub user gaborgsomogyi opened a pull request:
https://github.com/apache/spark/pull/20807
SPARK-23660: Fix exception in yarn cluster mode when application ended fast
## What changes were proposed in this pull request?
Yarn throws the following exception in cluster mode when the application is
really small:
```
18/03/07 23:34:22 WARN netty.NettyRpcEnv: Ignored failure:
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7c974942
rejected from
java.util.concurrent.ScheduledThreadPoolExecutor@1eea9d2d[Terminated, pool size
= 0, active threads = 0, queued tasks = 0, completed tasks = 0]
18/03/07 23:34:22 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
at
org.apache.spark.deploy.yarn.YarnAllocator.<init>(YarnAllocator.scala:102)
at
org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:77)
at
org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:450)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:493)
at
org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:810)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:809)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:834)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already
stopped.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
at
org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
... 17 more
18/03/07 23:34:22 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException:
Exception thrown in awaitResult: )
```
Example application:
```
object ExampleApp {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("ExampleApp")
val sc = new SparkContext(conf)
try {
// Do nothing
} finally {
sc.stop()
}
}
```
This PR makes `initialExecutorIdCounter ` lazy. This way `YarnAllocator`
can be instantiated even if the driver already ended.
## How was this patch tested?
Automated: Additional unit test added
Manual: Application submitted into small cluster
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gaborgsomogyi/spark SPARK-23660
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20807.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20807
----
commit 114ac05102c9d563c922447423ec8445bb37e9ef
Author: Gabor Somogyi <gabor.g.somogyi@...>
Date: 2018-03-13T04:23:59Z
SPARK-23660: Fix exception in yarn cluster mode when application ended fast
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]