Re: Does Oozie support run sparkR with spark action?
Hi, I'm glad that you could make Spark R work. Thank you for sharing the solution with us! gp On Tue, Nov 29, 2016 at 2:15 AM, Dongying Jiao <pineapple...@gmail.com> wrote: > Hi: > Spark R can be run in oozie spark action. I tried to run the simple spark R > script under spark example folder, it is successful. > After setup R envioriment in your cluster, only need to put > spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder. > Below is the workflow I use for yarn cluster mode. > > > ${jobTracker} > ${nameNode} > ${master} > sparkRtest > ${nameNode}/user/oozie/sparkR/dataframe.R > --conf spark.driver.extraJavaOptions= > > > > > > > Thanks > > > 2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>: > > > Hi Peter: > > Thank you very much for your reply. > > I will have a try and tell you the result. > > > > 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>: > > > >> Hi, > >> > >> This exception is caused by a missing jar on the classpath. > >> The needed jars should be added to the classpath in Oozie action. This > >> blogpost > >> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel > >> ib-in-apache-oozie-cdh-5/>describes > >> several ways to do it. > >> > >> I've never tried to run a SparkR application from Oozie. I guess it can > be > >> done, but in the current state it need some manual work: > >> > >> According to Spark <https://github.com/apache/spark/tree/master/R>, the > >> SparkR libraries should be under $SPARK_HOME/R/lib, and $R_HOME should > be > >> also set for the job. > >> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, > and > >> you could add the SparkR stuff to Spark sharelib to make it available in > >> the action. > >> It's not guarantied that it will work after these steps, but there's a > >> chance. I would be delighted to hear about the result if you have the > time > >> to try to make this work. > >> > >> Thanks, > >> gp > >> > >> > >> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com> > >> wrote: > >> > >> > Hi: > >> > I have an issue with oozie run sparkR, could you please help me? > >> > I try to run sparkR job through oozie in yarn-client mode. And I have > >> > installed R package in all my nodes. > >> > > >> > job.properties is like: > >> > nameNode=hdfs://XXX:8020 > >> > jobTracker=XXX:8050 > >> > master=yarn-client > >> > queueName=default > >> > oozie.use.system.libpath=true > >> > oozie.wf.application.path=/user/oozie/measurecountWF > >> > > >> > The workflow is like: > >> > > >> > > >> > > >> > > >> > oozie.launcher.yarn. > >> app.mapreduce.am.env > >> > SPARK_HOME= > >> > > >> > > >> > > >> > > >> > > >> > > >> > ${jobTracker} > >> > ${nameNode} > >> > ${master} > >> > measurecountWF > >> > measurecount.R > >> > --conf spark.driver.extraJavaOptions= > >> > > >> > > >> > > >> > > >> > > >> > > >> > Workflow failed, error > >> > message[${wf:errorMessage(wf:lastErrorNode())}] > >> > > >> > > >> > > >> > > >> > > >> > It failed with class not found exception. > >> > > >> > org.apache.spark.SparkException: Job aborted due to stage failure: > >> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 > >> > in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException: > >> > com.cloudant.spark.common.JsonStoreRDDPartition > >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > >> > at java.lang.ClassLoa
Re: Does Oozie support run sparkR with spark action?
Hi: Spark R can be run in oozie spark action. I tried to run the simple spark R script under spark example folder, it is successful. After setup R envioriment in your cluster, only need to put spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder. Below is the workflow I use for yarn cluster mode. ${jobTracker} ${nameNode} ${master} sparkRtest ${nameNode}/user/oozie/sparkR/dataframe.R --conf spark.driver.extraJavaOptions= Thanks 2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>: > Hi Peter: > Thank you very much for your reply. > I will have a try and tell you the result. > > 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>: > >> Hi, >> >> This exception is caused by a missing jar on the classpath. >> The needed jars should be added to the classpath in Oozie action. This >> blogpost >> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel >> ib-in-apache-oozie-cdh-5/>describes >> several ways to do it. >> >> I've never tried to run a SparkR application from Oozie. I guess it can be >> done, but in the current state it need some manual work: >> >> According to Spark <https://github.com/apache/spark/tree/master/R>, the >> SparkR libraries should be under $SPARK_HOME/R/lib, and $R_HOME should be >> also set for the job. >> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and >> you could add the SparkR stuff to Spark sharelib to make it available in >> the action. >> It's not guarantied that it will work after these steps, but there's a >> chance. I would be delighted to hear about the result if you have the time >> to try to make this work. >> >> Thanks, >> gp >> >> >> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com> >> wrote: >> >> > Hi: >> > I have an issue with oozie run sparkR, could you please help me? >> > I try to run sparkR job through oozie in yarn-client mode. And I have >> > installed R package in all my nodes. >> > >> > job.properties is like: >> > nameNode=hdfs://XXX:8020 >> > jobTracker=XXX:8050 >> > master=yarn-client >> > queueName=default >> > oozie.use.system.libpath=true >> > oozie.wf.application.path=/user/oozie/measurecountWF >> > >> > The workflow is like: >> > >> > >> > >> > >> > oozie.launcher.yarn. >> app.mapreduce.am.env >> > SPARK_HOME= >> > >> > >> > >> > >> > >> > >> > ${jobTracker} >> > ${nameNode} >> > ${master} >> > measurecountWF >> > measurecount.R >> > --conf spark.driver.extraJavaOptions= >> > >> > >> > >> > >> > >> > >> > Workflow failed, error >> > message[${wf:errorMessage(wf:lastErrorNode())}] >> > >> > >> > >> > >> > >> > It failed with class not found exception. >> > >> > org.apache.spark.SparkException: Job aborted due to stage failure: >> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 >> > in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException: >> > com.cloudant.spark.common.JsonStoreRDDPartition >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> > at java.lang.Class.forName0(Native Method) >> > at java.lang.Class.forName(Class.java:348) >> > at org.apache.spark.serializer.JavaDeserializationStream$$ >> > anon$1.resolveClass(JavaSerializer.scala:68) >> > at java.io.ObjectInputStream.readNonProxyDesc( >> > ObjectInputStream.java:1613) >> > at java.io.ObjectInputStream.readClassDesc( >> > ObjectInputStream.java:1518) >> > at java.io.ObjectInputStream.readOrdinaryObject( >> > ObjectInputStream.java:1774) >> > at java.io.ObjectInputStream.readObject0(ObjectInputStream. >> > java:1351) >> > at java.io.ObjectInputStream.defaultReadFields(ObjectInpu >> > Calls: sql -> callJMethod -> invokeJava >> > Execution halted >> > Intercepting System.exit(1) >> > >> > Does oozie support run sparkR in spark action? Or we should only wrap >> > it in ssh action? >> > >> > Thanks a lot >> > >> >> >> >> -- >> Peter Cseh >> Software Engineer >> <http://www.cloudera.com> >> > >
Re: Does Oozie support run sparkR with spark action?
Hi, This exception is caused by a missing jar on the classpath. The needed jars should be added to the classpath in Oozie action. This blogpost <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/>describes several ways to do it. I've never tried to run a SparkR application from Oozie. I guess it can be done, but in the current state it need some manual work: According to Spark <https://github.com/apache/spark/tree/master/R>, the SparkR libraries should be under $SPARK_HOME/R/lib, and $R_HOME should be also set for the job. $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and you could add the SparkR stuff to Spark sharelib to make it available in the action. It's not guarantied that it will work after these steps, but there's a chance. I would be delighted to hear about the result if you have the time to try to make this work. Thanks, gp On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com> wrote: > Hi: > I have an issue with oozie run sparkR, could you please help me? > I try to run sparkR job through oozie in yarn-client mode. And I have > installed R package in all my nodes. > > job.properties is like: > nameNode=hdfs://XXX:8020 > jobTracker=XXX:8050 > master=yarn-client > queueName=default > oozie.use.system.libpath=true > oozie.wf.application.path=/user/oozie/measurecountWF > > The workflow is like: > > > > > oozie.launcher.yarn.app.mapreduce.am.env > SPARK_HOME= > > > > > > > ${jobTracker} > ${nameNode} > ${master} > measurecountWF > measurecount.R > --conf spark.driver.extraJavaOptions= > > > > > > > Workflow failed, error > message[${wf:errorMessage(wf:lastErrorNode())}] > > > > > > It failed with class not found exception. > > org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 > in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException: > com.cloudant.spark.common.JsonStoreRDDPartition > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.serializer.JavaDeserializationStream$$ > anon$1.resolveClass(JavaSerializer.scala:68) > at java.io.ObjectInputStream.readNonProxyDesc( > ObjectInputStream.java:1613) > at java.io.ObjectInputStream.readClassDesc( > ObjectInputStream.java:1518) > at java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1351) > at java.io.ObjectInputStream.defaultReadFields(ObjectInpu > Calls: sql -> callJMethod -> invokeJava > Execution halted > Intercepting System.exit(1) > > Does oozie support run sparkR in spark action? Or we should only wrap > it in ssh action? > > Thanks a lot > -- Peter Cseh Software Engineer <http://www.cloudera.com>
Does Oozie support run sparkR with spark action?
Hi: I have an issue with oozie run sparkR, could you please help me? I try to run sparkR job through oozie in yarn-client mode. And I have installed R package in all my nodes. job.properties is like: nameNode=hdfs://XXX:8020 jobTracker=XXX:8050 master=yarn-client queueName=default oozie.use.system.libpath=true oozie.wf.application.path=/user/oozie/measurecountWF The workflow is like: oozie.launcher.yarn.app.mapreduce.am.env SPARK_HOME= ${jobTracker} ${nameNode} ${master} measurecountWF measurecount.R --conf spark.driver.extraJavaOptions= Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] It failed with class not found exception. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException: com.cloudant.spark.common.JsonStoreRDDPartition at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInpu Calls: sql -> callJMethod -> invokeJava Execution halted Intercepting System.exit(1) Does oozie support run sparkR in spark action? Or we should only wrap it in ssh action? Thanks a lot