Hi,

I'm glad that you could make Spark R work.
Thank you for sharing the solution with us!

gp


On Tue, Nov 29, 2016 at 2:15 AM, Dongying Jiao <pineapple...@gmail.com>
wrote:

> Hi:
> Spark R can be run in oozie spark action. I tried to run the simple spark R
> script under spark example folder, it is successful.
> After setup R envioriment in your cluster, only need to put
> spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder.
> Below is the workflow I use for yarn cluster mode.
> <action name="sparkAction">
>         <spark xmlns="uri:oozie:spark-action:0.1">
>                 <job-tracker>${jobTracker}</job-tracker>
>                 <name-node>${nameNode}</name-node>
>                 <master>${master}</master>
>                 <name>sparkRtest</name>
>                 <jar>${nameNode}/user/oozie/sparkR/dataframe.R</jar>
>          <spark-opts>--conf spark.driver.extraJavaOptions=
> XXXX</spark-opts>
>          </spark>
>       <ok to="end"/>
>       <error to="fail"/>
>   </action>
>
> Thanks
>
>
> 2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>:
>
> > Hi Peter:
> > Thank you very much for your reply.
> > I will have a try and tell you the result.
> >
> > 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>:
> >
> >> Hi,
> >>
> >> This exception is caused by a missing jar on the classpath.
> >> The needed jars  should be added to the classpath in Oozie action. This
> >> blogpost
> >> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel
> >> ib-in-apache-oozie-cdh-5/>describes
> >> several ways to do it.
> >>
> >> I've never tried to run a SparkR application from Oozie. I guess it can
> be
> >> done, but in the current state it need some manual work:
> >>
> >> According to Spark <https://github.com/apache/spark/tree/master/R>, the
> >> SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should
> be
> >> also set for the job.
> >> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482,
> and
> >> you could add the SparkR stuff to Spark sharelib to make it available in
> >> the action.
> >> It's not guarantied that it will work after these steps, but there's a
> >> chance. I would be delighted to hear about the result if you have the
> time
> >> to try to make this work.
> >>
> >> Thanks,
> >> gp
> >>
> >>
> >> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
> >> wrote:
> >>
> >> > Hi:
> >> > I have an issue with oozie run sparkR, could you please help me?
> >> > I try to run sparkR job through oozie in yarn-client mode. And I have
> >> > installed R package in all my nodes.
> >> >
> >> > job.properties is like:
> >> > nameNode=hdfs://XXX:8020
> >> > jobTracker=XXX:8050
> >> > master=yarn-client
> >> > queueName=default
> >> > oozie.use.system.libpath=true
> >> > oozie.wf.application.path=/user/oozie/measurecountWF
> >> >
> >> > The workflow is like:
> >> > <workflow-app xmlns='uri:oozie:workflow:0.5' name='measurecountWF'>
> >> > <global>
> >> >             <configuration>
> >> >                 <property>
> >> >                     <name>oozie.launcher.yarn.
> >> app.mapreduce.am.env</name>
> >> >                     <value>SPARK_HOME=XXXX</value>
> >> >                 </property>
> >> >             </configuration>
> >> > </global>
> >> > <start to="sparkAction"/>
> >> >     <action name="sparkAction">
> >> >         <spark xmlns="uri:oozie:spark-action:0.1">
> >> >                 <job-tracker>${jobTracker}</job-tracker>
> >> >                 <name-node>${nameNode}</name-node>
> >> >                 <master>${master}</master>
> >> >                 <name>measurecountWF</name>
> >> >                 <jar>measurecount.R</jar>
> >> >          <spark-opts>--conf spark.driver.extraJavaOptions=
> >> > XXXX</spark-opts>
> >> >          </spark>
> >> > <ok to="end"/>
> >> >       <error to="fail"/>
> >> >   </action>
> >> >   <kill name="fail">
> >> >         <message>Workflow failed, error
> >> >         message[${wf:errorMessage(wf:lastErrorNode())}]
> >> >         </message>
> >> >   </kill>
> >> >   <end name="end"/>
> >> > </workflow-app>
> >> >
> >> > It failed with class not found exception.
> >> >
> >> > org.apache.spark.SparkException: Job aborted due to stage failure:
> >> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
> >> > in stage 0.0 (TID 3, XXXX): java.lang.ClassNotFoundException:
> >> > com.cloudant.spark.common.JsonStoreRDDPartition
> >> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >> >         at java.lang.Class.forName0(Native Method)
> >> >         at java.lang.Class.forName(Class.java:348)
> >> >         at org.apache.spark.serializer.JavaDeserializationStream$$
> >> > anon$1.resolveClass(JavaSerializer.scala:68)
> >> >         at java.io.ObjectInputStream.readNonProxyDesc(
> >> > ObjectInputStream.java:1613)
> >> >         at java.io.ObjectInputStream.readClassDesc(
> >> > ObjectInputStream.java:1518)
> >> >         at java.io.ObjectInputStream.readOrdinaryObject(
> >> > ObjectInputStream.java:1774)
> >> >         at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> >> > java:1351)
> >> >         at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
> >> > Calls: sql -> callJMethod -> invokeJava
> >> > Execution halted
> >> > Intercepting System.exit(1)
> >> >
> >> > Does oozie support run sparkR in spark action? Or we should only wrap
> >> > it in ssh action?
> >> >
> >> > Thanks a lot
> >> >
> >>
> >>
> >>
> >> --
> >> Peter Cseh
> >> Software Engineer
> >> <http://www.cloudera.com>
> >>
> >
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>

Reply via email to