We are trying to submit a Spark job through YARN with the following command:

spark-submit --conf spark.yarn.stagingDir=/path/to/stage  --verbose --class 
com.my.class --jars /path/to/jar1,path/to/jar2 /path/to/main/jar/application.jar

The application is being populated in the YARN scheduler however it never 
appears to start the Application Master container. The trace is:

17/12/04 11:21:32 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
17/12/04 11:27:16 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might 
have been killed or unable to launch application master.
    at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
    at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
    at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
    at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at 
org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:98)
    at 
org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:68)
    at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:197)
    at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:86)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at com.my.class.main(myclass.java:202)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/12/04 11:27:16 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to 
request executors before the AM has registered!
17/12/04 11:27:16 WARN MetricsSystem: Stopping a MetricsSystem that is not 
running
Exception in thread "main" org.apache.spark.SparkException: Yarn application 
has already ended! It might have been killed or unable to launch application 
master.
    at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
    at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
    at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
    at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at 
org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:98)
    at 
org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:68)
    at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:197)
    at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:86)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at at com.my.class.main(myclass.java:202)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

We are able to run the sample Spark Pi job to completion without errors.

Version:

$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0-mapr-1703
      /_/

Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_101


We think the issue may be with how we are setting up our pipeline:

    PipelineOptions options = PipelineOptionsFactory.create();
    options.setRunner(SparkRunner.class);
    Pipeline p = Pipeline.create(options);

We run our Pipeline with:

p.run(options);

The pipeline was running successfully with the DirectRunner.

We made sure to include beam-runners-spark as a Maven Dependency.

Any ideas?


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

Reply via email to