Hi, I'm glad that you could make Spark R work. Thank you for sharing the solution with us!
gp On Tue, Nov 29, 2016 at 2:15 AM, Dongying Jiao <pineapple...@gmail.com> wrote: > Hi: > Spark R can be run in oozie spark action. I tried to run the simple spark R > script under spark example folder, it is successful. > After setup R envioriment in your cluster, only need to put > spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder. > Below is the workflow I use for yarn cluster mode. > <action name="sparkAction"> > <spark xmlns="uri:oozie:spark-action:0.1"> > <job-tracker>${jobTracker}</job-tracker> > <name-node>${nameNode}</name-node> > <master>${master}</master> > <name>sparkRtest</name> > <jar>${nameNode}/user/oozie/sparkR/dataframe.R</jar> > <spark-opts>--conf spark.driver.extraJavaOptions= > XXXX</spark-opts> > </spark> > <ok to="end"/> > <error to="fail"/> > </action> > > Thanks > > > 2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>: > > > Hi Peter: > > Thank you very much for your reply. > > I will have a try and tell you the result. > > > > 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>: > > > >> Hi, > >> > >> This exception is caused by a missing jar on the classpath. > >> The needed jars should be added to the classpath in Oozie action. This > >> blogpost > >> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel > >> ib-in-apache-oozie-cdh-5/>describes > >> several ways to do it. > >> > >> I've never tried to run a SparkR application from Oozie. I guess it can > be > >> done, but in the current state it need some manual work: > >> > >> According to Spark <https://github.com/apache/spark/tree/master/R>, the > >> SparkR libraries should be under $SPARK_HOME/R/lib, and $R_HOME should > be > >> also set for the job. > >> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, > and > >> you could add the SparkR stuff to Spark sharelib to make it available in > >> the action. > >> It's not guarantied that it will work after these steps, but there's a > >> chance. I would be delighted to hear about the result if you have the > time > >> to try to make this work. > >> > >> Thanks, > >> gp > >> > >> > >> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com> > >> wrote: > >> > >> > Hi: > >> > I have an issue with oozie run sparkR, could you please help me? > >> > I try to run sparkR job through oozie in yarn-client mode. And I have > >> > installed R package in all my nodes. > >> > > >> > job.properties is like: > >> > nameNode=hdfs://XXX:8020 > >> > jobTracker=XXX:8050 > >> > master=yarn-client > >> > queueName=default > >> > oozie.use.system.libpath=true > >> > oozie.wf.application.path=/user/oozie/measurecountWF > >> > > >> > The workflow is like: > >> > <workflow-app xmlns='uri:oozie:workflow:0.5' name='measurecountWF'> > >> > <global> > >> > <configuration> > >> > <property> > >> > <name>oozie.launcher.yarn. > >> app.mapreduce.am.env</name> > >> > <value>SPARK_HOME=XXXX</value> > >> > </property> > >> > </configuration> > >> > </global> > >> > <start to="sparkAction"/> > >> > <action name="sparkAction"> > >> > <spark xmlns="uri:oozie:spark-action:0.1"> > >> > <job-tracker>${jobTracker}</job-tracker> > >> > <name-node>${nameNode}</name-node> > >> > <master>${master}</master> > >> > <name>measurecountWF</name> > >> > <jar>measurecount.R</jar> > >> > <spark-opts>--conf spark.driver.extraJavaOptions= > >> > XXXX</spark-opts> > >> > </spark> > >> > <ok to="end"/> > >> > <error to="fail"/> > >> > </action> > >> > <kill name="fail"> > >> > <message>Workflow failed, error > >> > message[${wf:errorMessage(wf:lastErrorNode())}] > >> > </message> > >> > </kill> > >> > <end name="end"/> > >> > </workflow-app> > >> > > >> > It failed with class not found exception. > >> > > >> > org.apache.spark.SparkException: Job aborted due to stage failure: > >> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 > >> > in stage 0.0 (TID 3, XXXX): java.lang.ClassNotFoundException: > >> > com.cloudant.spark.common.JsonStoreRDDPartition > >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > >> > at java.lang.Class.forName0(Native Method) > >> > at java.lang.Class.forName(Class.java:348) > >> > at org.apache.spark.serializer.JavaDeserializationStream$$ > >> > anon$1.resolveClass(JavaSerializer.scala:68) > >> > at java.io.ObjectInputStream.readNonProxyDesc( > >> > ObjectInputStream.java:1613) > >> > at java.io.ObjectInputStream.readClassDesc( > >> > ObjectInputStream.java:1518) > >> > at java.io.ObjectInputStream.readOrdinaryObject( > >> > ObjectInputStream.java:1774) > >> > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > >> > java:1351) > >> > at java.io.ObjectInputStream.defaultReadFields(ObjectInpu > >> > Calls: sql -> callJMethod -> invokeJava > >> > Execution halted > >> > Intercepting System.exit(1) > >> > > >> > Does oozie support run sparkR in spark action? Or we should only wrap > >> > it in ssh action? > >> > > >> > Thanks a lot > >> > > >> > >> > >> > >> -- > >> Peter Cseh > >> Software Engineer > >> <http://www.cloudera.com> > >> > > > > > -- Peter Cseh Software Engineer <http://www.cloudera.com>