Hi,
I am running a 3 node cluster (HDP 2.3, installed using ambari 2.1.1).
I have been trying to run a spark job that runs a word count program using
the spark action.
It program runs fine when master is set to local but runs into errors when
set to yarn-cluster or yarn-client.
My workflow is as follows
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns='uri:oozie:workflow:0.4' name='sparkjob'>
<start to='spark-process' />
<action name='spark-process'>
<spark xmlns='uri:oozie:spark-action:0.1'>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.service.SparkConfigurationService.spark.configurations</name>
<value>spark.eventLog.dir=hdfs://node1.analytics.subex:8020/user/spark/applicationHistory,spark.yarn.historyServer.address=
http://node1.analytics.subex:18088,spark.eventLog.enabled=true</value>
</property>
<!--property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>/user/oozie/share/lib/lib_20150831190253/spark</value>
</property-->
</configuration>
<master>yarn-client</master>
<mode>client</mode>
<name>Word Count</name>
<jar>/usr/hdp/current/spark-client/AnalyticsJar/wordcount.py</jar>
<spark-opts>--executor-memory 1G --driver-memory 1G
--executor-cores 4 --num-executors 2 --jars
/usr/hdp/current/spark-client/lib/spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar</spark-opts>
</spark>
<ok to='end'/>
<error to='spark-fail'/>
</action>
<kill name='spark-fail'>
<message>Spark job failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
I get the following error:
Traceback (most recent call last):
File "/usr/hdp/current/spark-client/AnalyticsJar/wordcount.py", line
26, in <module>
sc = SparkContext(conf=conf)
File
"/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/pyspark/context.py",
line 107, in __init__
File
"/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/pyspark/context.py",
line 155, in _do_init
File
"/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/pyspark/context.py",
line 201, in _initialize_context
File
"/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/py4j/java_gateway.py",
line 701, in __call__
File
"/hadoop/yarn/local/filecache/251/spark-core_2.10-1.1.0.jar/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: YARN mode not available ?
at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1586)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:310)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.scheduler.cluster.YarnClientClusterScheduler
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1580)
... 13 more
The steps I have taken
1. Copied the jars in spark-client/lib directory to
/user/oozie/share/lib/spark followed by a restart of the spark service
2. Passed the assembly jar within <spark-opts></spark-opts> (see workflow)
3. Tried setting oozie.service.WorkflowAppService.system.libpath to the
jars in the share lib directory
It seems that spark is not getting the right jars for deploying the job in
yarn even though I have tried to make the jars available to the workflow.
While scanning through the detailed logs, I have also noticed that the
assembly jar is present in the yarn application folder and also present in
oozie classpath.
Is there some configuration that I'm missing? Would appreciate any help.
Regards,
Nitin