Re: Does Oozie support run sparkR with spark action?

2016-11-29 Thread Peter Cseh
Hi,

I'm glad that you could make Spark R work.
Thank you for sharing the solution with us!

gp


On Tue, Nov 29, 2016 at 2:15 AM, Dongying Jiao <pineapple...@gmail.com>
wrote:

> Hi:
> Spark R can be run in oozie spark action. I tried to run the simple spark R
> script under spark example folder, it is successful.
> After setup R envioriment in your cluster, only need to put
> spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder.
> Below is the workflow I use for yarn cluster mode.
> 
> 
> ${jobTracker}
> ${nameNode}
> ${master}
> sparkRtest
> ${nameNode}/user/oozie/sparkR/dataframe.R
>  --conf spark.driver.extraJavaOptions=
> 
>  
>   
>   
>   
>
> Thanks
>
>
> 2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>:
>
> > Hi Peter:
> > Thank you very much for your reply.
> > I will have a try and tell you the result.
> >
> > 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>:
> >
> >> Hi,
> >>
> >> This exception is caused by a missing jar on the classpath.
> >> The needed jars  should be added to the classpath in Oozie action. This
> >> blogpost
> >> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel
> >> ib-in-apache-oozie-cdh-5/>describes
> >> several ways to do it.
> >>
> >> I've never tried to run a SparkR application from Oozie. I guess it can
> be
> >> done, but in the current state it need some manual work:
> >>
> >> According to Spark <https://github.com/apache/spark/tree/master/R>, the
> >> SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should
> be
> >> also set for the job.
> >> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482,
> and
> >> you could add the SparkR stuff to Spark sharelib to make it available in
> >> the action.
> >> It's not guarantied that it will work after these steps, but there's a
> >> chance. I would be delighted to hear about the result if you have the
> time
> >> to try to make this work.
> >>
> >> Thanks,
> >> gp
> >>
> >>
> >> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
> >> wrote:
> >>
> >> > Hi:
> >> > I have an issue with oozie run sparkR, could you please help me?
> >> > I try to run sparkR job through oozie in yarn-client mode. And I have
> >> > installed R package in all my nodes.
> >> >
> >> > job.properties is like:
> >> > nameNode=hdfs://XXX:8020
> >> > jobTracker=XXX:8050
> >> > master=yarn-client
> >> > queueName=default
> >> > oozie.use.system.libpath=true
> >> > oozie.wf.application.path=/user/oozie/measurecountWF
> >> >
> >> > The workflow is like:
> >> > 
> >> > 
> >> > 
> >> > 
> >> > oozie.launcher.yarn.
> >> app.mapreduce.am.env
> >> > SPARK_HOME=
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > ${jobTracker}
> >> > ${nameNode}
> >> > ${master}
> >> > measurecountWF
> >> > measurecount.R
> >> >  --conf spark.driver.extraJavaOptions=
> >> > 
> >> >  
> >> > 
> >> >   
> >> >   
> >> >   
> >> > Workflow failed, error
> >> > message[${wf:errorMessage(wf:lastErrorNode())}]
> >> > 
> >> >   
> >> >   
> >> > 
> >> >
> >> > It failed with class not found exception.
> >> >
> >> > org.apache.spark.SparkException: Job aborted due to stage failure:
> >> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
> >> > in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException:
> >> > com.cloudant.spark.common.JsonStoreRDDPartition
> >> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> > at java.lang.ClassLoa

Re: Does Oozie support run sparkR with spark action?

2016-11-28 Thread Dongying Jiao
Hi:
Spark R can be run in oozie spark action. I tried to run the simple spark R
script under spark example folder, it is successful.
After setup R envioriment in your cluster, only need to put
spark-assembly.jar and $SPARK_HOME/R/libsparkr.zip in worflow lib folder.
Below is the workflow I use for yarn cluster mode.


${jobTracker}
${nameNode}
${master}
sparkRtest
${nameNode}/user/oozie/sparkR/dataframe.R
 --conf spark.driver.extraJavaOptions=
 
  
  
  

Thanks


2016-11-15 13:59 GMT+08:00 Dongying Jiao <pineapple...@gmail.com>:

> Hi Peter:
> Thank you very much for your reply.
> I will have a try and tell you the result.
>
> 2016-11-12 5:02 GMT+08:00 Peter Cseh <gezap...@cloudera.com>:
>
>> Hi,
>>
>> This exception is caused by a missing jar on the classpath.
>> The needed jars  should be added to the classpath in Oozie action. This
>> blogpost
>> <http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharel
>> ib-in-apache-oozie-cdh-5/>describes
>> several ways to do it.
>>
>> I've never tried to run a SparkR application from Oozie. I guess it can be
>> done, but in the current state it need some manual work:
>>
>> According to Spark <https://github.com/apache/spark/tree/master/R>, the
>> SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should be
>> also set for the job.
>> $SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and
>> you could add the SparkR stuff to Spark sharelib to make it available in
>> the action.
>> It's not guarantied that it will work after these steps, but there's a
>> chance. I would be delighted to hear about the result if you have the time
>> to try to make this work.
>>
>> Thanks,
>> gp
>>
>>
>> On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
>> wrote:
>>
>> > Hi:
>> > I have an issue with oozie run sparkR, could you please help me?
>> > I try to run sparkR job through oozie in yarn-client mode. And I have
>> > installed R package in all my nodes.
>> >
>> > job.properties is like:
>> > nameNode=hdfs://XXX:8020
>> > jobTracker=XXX:8050
>> > master=yarn-client
>> > queueName=default
>> > oozie.use.system.libpath=true
>> > oozie.wf.application.path=/user/oozie/measurecountWF
>> >
>> > The workflow is like:
>> > 
>> > 
>> > 
>> > 
>> > oozie.launcher.yarn.
>> app.mapreduce.am.env
>> > SPARK_HOME=
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > ${jobTracker}
>> > ${nameNode}
>> > ${master}
>> > measurecountWF
>> > measurecount.R
>> >  --conf spark.driver.extraJavaOptions=
>> > 
>> >  
>> > 
>> >   
>> >   
>> >   
>> > Workflow failed, error
>> > message[${wf:errorMessage(wf:lastErrorNode())}]
>> > 
>> >   
>> >   
>> > 
>> >
>> > It failed with class not found exception.
>> >
>> > org.apache.spark.SparkException: Job aborted due to stage failure:
>> > Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
>> > in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException:
>> > com.cloudant.spark.common.JsonStoreRDDPartition
>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> > at java.lang.Class.forName0(Native Method)
>> >     at java.lang.Class.forName(Class.java:348)
>> > at org.apache.spark.serializer.JavaDeserializationStream$$
>> > anon$1.resolveClass(JavaSerializer.scala:68)
>> > at java.io.ObjectInputStream.readNonProxyDesc(
>> > ObjectInputStream.java:1613)
>> > at java.io.ObjectInputStream.readClassDesc(
>> > ObjectInputStream.java:1518)
>> > at java.io.ObjectInputStream.readOrdinaryObject(
>> > ObjectInputStream.java:1774)
>> > at java.io.ObjectInputStream.readObject0(ObjectInputStream.
>> > java:1351)
>> > at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
>> > Calls: sql -> callJMethod -> invokeJava
>> > Execution halted
>> > Intercepting System.exit(1)
>> >
>> > Does oozie support run sparkR in spark action? Or we should only wrap
>> > it in ssh action?
>> >
>> > Thanks a lot
>> >
>>
>>
>>
>> --
>> Peter Cseh
>> Software Engineer
>> <http://www.cloudera.com>
>>
>
>


Re: Does Oozie support run sparkR with spark action?

2016-11-11 Thread Peter Cseh
Hi,

This exception is caused by a missing jar on the classpath.
The needed jars  should be added to the classpath in Oozie action. This
blogpost
<http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/>describes
several ways to do it.

I've never tried to run a SparkR application from Oozie. I guess it can be
done, but in the current state it need some manual work:

According to Spark <https://github.com/apache/spark/tree/master/R>, the
SparkR libraries should be under  $SPARK_HOME/R/lib, and $R_HOME should be
also set for the job.
$SPARK_HOME is set to the current directory in Oozie after OOZIE-2482, and
you could add the SparkR stuff to Spark sharelib to make it available in
the action.
It's not guarantied that it will work after these steps, but there's a
chance. I would be delighted to hear about the result if you have the time
to try to make this work.

Thanks,
gp


On Tue, Nov 8, 2016 at 10:55 AM, Dongying Jiao <pineapple...@gmail.com>
wrote:

> Hi:
> I have an issue with oozie run sparkR, could you please help me?
> I try to run sparkR job through oozie in yarn-client mode. And I have
> installed R package in all my nodes.
>
> job.properties is like:
> nameNode=hdfs://XXX:8020
> jobTracker=XXX:8050
> master=yarn-client
> queueName=default
> oozie.use.system.libpath=true
> oozie.wf.application.path=/user/oozie/measurecountWF
>
> The workflow is like:
> 
> 
> 
> 
> oozie.launcher.yarn.app.mapreduce.am.env
> SPARK_HOME=
> 
> 
> 
> 
> 
> 
> ${jobTracker}
> ${nameNode}
> ${master}
> measurecountWF
> measurecount.R
>  --conf spark.driver.extraJavaOptions=
> 
>  
> 
>   
>   
>   
> Workflow failed, error
> message[${wf:errorMessage(wf:lastErrorNode())}]
> 
>   
>   
> 
>
> It failed with class not found exception.
>
> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
> in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException:
> com.cloudant.spark.common.JsonStoreRDDPartition
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.serializer.JavaDeserializationStream$$
> anon$1.resolveClass(JavaSerializer.scala:68)
> at java.io.ObjectInputStream.readNonProxyDesc(
> ObjectInputStream.java:1613)
> at java.io.ObjectInputStream.readClassDesc(
> ObjectInputStream.java:1518)
> at java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
> Calls: sql -> callJMethod -> invokeJava
> Execution halted
> Intercepting System.exit(1)
>
> Does oozie support run sparkR in spark action? Or we should only wrap
> it in ssh action?
>
> Thanks a lot
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>


Does Oozie support run sparkR with spark action?

2016-11-08 Thread Dongying Jiao
Hi:
I have an issue with oozie run sparkR, could you please help me?
I try to run sparkR job through oozie in yarn-client mode. And I have
installed R package in all my nodes.

job.properties is like:
nameNode=hdfs://XXX:8020
jobTracker=XXX:8050
master=yarn-client
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=/user/oozie/measurecountWF

The workflow is like:




oozie.launcher.yarn.app.mapreduce.am.env
SPARK_HOME=






${jobTracker}
${nameNode}
${master}
measurecountWF
measurecount.R
 --conf spark.driver.extraJavaOptions=
 

  
  
  
Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]

  
  


It failed with class not found exception.

org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3
in stage 0.0 (TID 3, ): java.lang.ClassNotFoundException:
com.cloudant.spark.common.JsonStoreRDDPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInpu
Calls: sql -> callJMethod -> invokeJava
Execution halted
Intercepting System.exit(1)

Does oozie support run sparkR in spark action? Or we should only wrap
it in ssh action?

Thanks a lot