Hi All,

  When a Spark job (Spark-1.5.2) is submitted with a single executor and if
user passes some wrong JVM arguments with spark.executor.extraJavaOptions,
the first executor fails. But the job keeps on retrying, creating a new
executor and failing every tim*e, *until CTRL-C is pressed*. *Do we have
configuration to limit the retry attempts.

*Example:*

./spark-submit --class SimpleApp --master "spark://10.10.72.145:7077"
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35
-XX:ConcGCThreads=16" /SPARK/SimpleApp.jar

Executor fails with

Error occurred during initialization of VM
Can't have more ConcGCThreads than ParallelGCThreads.

But the job does not exit, keeps on creating executors and retrying.
..........
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: *Granted executor ID
app-20160201065319-0014/2846* on hostPort 10.10.72.145:36558 with 12 cores,
2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2846 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2846 is now RUNNING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2846 is now EXITED (Command exited with code 1)
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor
app-20160201065319-0014/2846 removed: Command exited with code 1
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove
non-existent executor 2846
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: *Executor added:
app-20160201065319-0014/2847* on worker-20160131230345-10.10.72.145-36558 (
10.10.72.145:36558) with 12 cores
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20160201065319-0014/2847 on hostPort 10.10.72.145:36558 with 12 cores,
2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2847 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2847 is now EXITED (Command exited with code 1)
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor
app-20160201065319-0014/2847 removed: Command exited with code 1
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove
non-existent executor 2847
16/02/01 06:54:28 INFO AppClient$ClientEndpoint:* Executor added:
app-20160201065319-0014/2848* on worker-20160131230345-10.10.72.145-36558 (
10.10.72.145:36558) with 12 cores
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20160201065319-0014/2848 on hostPort 10.10.72.145:36558 with 12 cores,
2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2848 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2848 is now RUNNING
............



Thanks,
Prabhu Joseph

Reply via email to