Hi:
Spark R can be run in oozie spark action. I tried to run the simple spark R
script under spark example folder, it is successful.
After setup R envioriment in your cluster, only need to put
spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder.
Below is the workflow I use for yarn cluster mode.
<action name="sparkAction">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>sparkRtest</name>
<jar>${nameNode}/user/oozie/sparkR/dataframe.R</jar>
<spark-opts>--conf spark.driver.extraJavaOptions=XXXX</spark-opts>
</spark>
<ok to="end"/>
<error to="fail"/>
</action>
Thanks
2016-11-15 13:59 GMT+08:00 Dongying Jiao <[email protected]>:
> Hi Peter:
> Thank you very much for your reply.
> I will have a try and tell you the result.
>
> 2016-11-12 5:02 GMT+08:00 Peter Cseh <[email protected]>:
>
>> Hi,
>>
>> This exception is caused by a missing jar on the classpath.
>> The needed jars should be added to the classpath in Oozie action. This
>> blogpost
>> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel
>> ib-in-apache-oozie-cdh-5/>describes
>> several ways to do it.
>>
>> I've never tried to run a SparkR application from Oozie. I guess it can be
>> done, but in the current state it need some manual work:
>>
>> According to Spark <https://github.com/apache/spark/tree/master/R>, the
>> SparkR libraries should be under $SPARK_HOME/R/lib, and $R_HOME should be
>> also set for the job.
>> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and
>> you could add the SparkR stuff to Spark sharelib to make it available in
>> the action.
>> It's not guarantied that it will work after these steps, but there's a
>> chance. I would be delighted to hear about the result if you have the time
>> to try to make this work.
>>
>> Thanks,
>> gp
>>
>>
>> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <[email protected]>
>> wrote:
>>
>> > Hi:
>> > I have an issue with oozie run sparkR, could you please help me?
>> > I try to run sparkR job through oozie in yarn-client mode. And I have
>> > installed R package in all my nodes.
>> >
>> > job.properties is like:
>> > nameNode=hdfs://XXX:8020
>> > jobTracker=XXX:8050
>> > master=yarn-client
>> > queueName=default
>> > oozie.use.system.libpath=true
>> > oozie.wf.application.path=/user/oozie/measurecountWF
>> >
>> > The workflow is like:
>> > <workflow-app xmlns='uri:oozie:workflow:0.5' name='measurecountWF'>
>> > <global>
>> > <configuration>
>> > <property>
>> > <name>oozie.launcher.yarn.
>> app.mapreduce.am.env</name>
>> > <value>SPARK_HOME=XXXX</value>
>> > </property>
>> > </configuration>
>> > </global>
>> > <start to="sparkAction"/>
>> > <action name="sparkAction">
>> > <spark xmlns="uri:oozie:spark-action:0.1">
>> > <job-tracker>${jobTracker}</job-tracker>
>> > <name-node>${nameNode}</name-node>
>> > <master>${master}</master>
>> > <name>measurecountWF</name>
>> > <jar>measurecount.R</jar>
>> > <spark-opts>--conf spark.driver.extraJavaOptions=
>> > XXXX</spark-opts>
>> > </spark>
>> > <ok to="end"/>
>> > <error to="fail"/>
>> > </action>
>> > <kill name="fail">
>> > <message>Workflow failed, error
>> > message[${wf:errorMessage(wf:lastErrorNode())}]
>> > </message>
>> > </kill>
>> > <end name="end"/>
>> > </workflow-app>
>> >
>> > It failed with class not found exception.
>> >
>> > org.apache.spark.SparkException: Job aborted due to stage failure:
>> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
>> > in stage 0.0 (TID 3, XXXX): java.lang.ClassNotFoundException:
>> > com.cloudant.spark.common.JsonStoreRDDPartition
>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> > at java.lang.Class.forName0(Native Method)
>> > at java.lang.Class.forName(Class.java:348)
>> > at org.apache.spark.serializer.JavaDeserializationStream$$
>> > anon$1.resolveClass(JavaSerializer.scala:68)
>> > at java.io.ObjectInputStream.readNonProxyDesc(
>> > ObjectInputStream.java:1613)
>> > at java.io.ObjectInputStream.readClassDesc(
>> > ObjectInputStream.java:1518)
>> > at java.io.ObjectInputStream.readOrdinaryObject(
>> > ObjectInputStream.java:1774)
>> > at java.io.ObjectInputStream.readObject0(ObjectInputStream.
>> > java:1351)
>> > at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
>> > Calls: sql -> callJMethod -> invokeJava
>> > Execution halted
>> > Intercepting System.exit(1)
>> >
>> > Does oozie support run sparkR in spark action? Or we should only wrap
>> > it in ssh action?
>> >
>> > Thanks a lot
>> >
>>
>>
>>
>> --
>> Peter Cseh
>> Software Engineer
>> <http://www.cloudera.com>
>>
>
>