Re: Spark action using python file as JAR

Robert Kanter Thu, 12 Nov 2015 10:50:46 -0800

I don't know if MapR makes any changes to how the sharelib works, so you
might try asking in their mailing list or forums to see if anyone can help
you there.  The information I shared about the sharelib in my previous
email was with my "Apache Hat" on, and should apply to Oozie 4.2 assuming
MapR didn't change anything.


As for CDH, while the version number doesn't say 4.2 (it's currently 4.1),
we do backport a number of patches on top of 4.1, including a lot of 4.2
and patches not in an Apache release yet.  This is true for most, if not
all, of the components we ship.

- Robert

On Thu, Nov 12, 2015 at 5:46 AM, Oehmichen, Axel <
[email protected]> wrote:

> Hello Robert,
>
> I have used maprR to install my Hadoop stack (I prefer Cloudera but CDH
> doesn't support yet OOzie 4.2) so I believed that the sharelib installation
> was correct.
>
> Before I change it manually, I got the java example running so I guess
> that the sharelib is properly built?
>
> Best,
> --
> Axel Oehmichen
> Research Assistant
> Data Science Institute
> +44 (0) 7 842 734 702
> [email protected]
>
>
> -----Original Message-----
> From: Robert Kanter [mailto:[email protected]]
> Sent: 11 November 2015 18:03
> To: [email protected]
> Subject: Re: Spark action using python file as JAR
>
> Hi Axel,
>
> The sharelib is not properly installed.  With Oozie 4.x, there's now an
> extra directory.  Instead of /oozie/share/lib/spark it should be
> /oozie/share/lib/lib_<timestamp>/spark.
>
> If you use the oozie-setup script, it will properly install the sharelib
> for you.
> http://oozie.apache.org/docs/4.2.0/AG_Install.html#Oozie_Server_Setup
>
> - Robert
>
>
> On Wed, Nov 11, 2015 at 6:51 AM, Oehmichen, Axel <
> [email protected]> wrote:
>
> > Tried again and it yielded the same error.
> >
> > Many thanks.
> >
> > Best,
> > --
> > Axel Oehmichen
> > Research Assistant
> > Data Science Institute
> > +44 (0) 7 842 734 702
> > [email protected]
> >
> >
> > -----Original Message-----
> > From: Oussama Chougna [mailto:[email protected]]
> > Sent: 11 November 2015 14:23
> > To: [email protected]
> > Subject: RE: Spark action using python file as JAR
> >
> > OK,
> > Now in your job.properties include:
> > oozie.use.system.libpath=true
> >
> > This tells Oozie to use that sharelib.
> > Cheers,
> > Oussama Chougna
> >
> > > From: [email protected]
> > > To: [email protected]
> > > Subject: RE: Spark action using python file as JAR
> > > Date: Wed, 11 Nov 2015 14:18:45 +0000
> > >
> > > Hello Oussama,
> > >
> > > Thanks for the response. The sharelib folder does exists on HDFS
> > > under
> > /oozie/share/lib/spark
> > >
> > > Best,
> > > Axel
> > >
> > > -----Original Message-----
> > > From: Oussama Chougna [mailto:[email protected]]
> > > Sent: 11 November 2015 13:30
> > > To: [email protected]
> > > Subject: RE: Spark action using python file as JAR
> > >
> > > Hi Axel,
> > > Did you also install the Oozie sharelib? Sounds like you missing the
> > sharelib, it is installed on HDFS. See the Oozie docs/MapR for a howto.
> > > Cheers,
> > >
> > > Oussama Chougna
> > >
> > > > From: [email protected]
> > > > To: [email protected]
> > > > Subject: Spark action using python file as JAR
> > > > Date: Wed, 11 Nov 2015 11:09:32 +0000
> > > >
> > > > Hello,
> > > >
> > > > I am trying to use OOzie to get some python workflows running. I
> > > > have
> > installed OOzie and Spark using mapR 5.0 which comes with OOzie 4.2
> > and Spark 1.4.1.
> > > > No matter what I do, I get this error message:
> > "java.lang.ClassNotFoundException: Class
> > org.apache.oozie.action.hadoop.SparkMain not found" Error code JA018.
> > > >
> > > > I was able to reproduce using the wordcount.py example.
> > > > (https://github.com/apache/spark/blob/master/examples/src/main/pyt
> > > > hon/
> > > > wordcount.py) (The idea of running wordcount comes from Nitin
> > > > Kumar
> > > > message)
> > > >
> > > > The command I run is "$ /opt/mapr/oozie/oozie-4.2.0/bin/oozie job
> > -oozie="http://localhost:11000/oozie"; -config job.properties -run "
> > > > I have tried through the Java API as well and I end up with the
> > > > same
> > result.
> > > >
> > > > My job.properties contains:
> > > > nameNode=maprfs:///
> > > > jobTracker=spark-master:8032
> > > > oozie.wf.application.path=maprfs:/user/mapr/
> > > >
> > > > my workflow.xml:
> > > >
> > > > <workflow-app xmlns='uri:oozie:workflow:0.5' name='Test'>
> > > >     <start to='spark-node' />
> > > >
> > > >     <action name='spark-node'>
> > > >         <spark xmlns="uri:oozie:spark-action:0.1">
> > > >             <job-tracker>${jobTracker}</job-tracker>
> > > >             <name-node>${nameNode}</name-node>
> > > >             <master>yarn-client</master>
> > > >             <mode>client</mode>
> > > >              <name>wordcount</name>
> > > >             <jar>wordcount.py</jar>
> > > >             <spark-opts>--num-executors 2 --driver-memory 1024m
> > --executor-memory 512m --executor-cores 1</spark-opts>
> > > >         </spark>
> > > >         <ok to="end" />
> > > >         <error to="fail" />
> > > >     </action>
> > > >
> > > >     <kill name="fail">
> > > >         <message>Workflow failed, error
> > > >             message[${wf:errorMessage(wf:lastErrorNode())}]
> > > >         </message>
> > > >     </kill>
> > > >     <end name='end' />
> > > > </workflow-app>
> > > >
> > > > I have tried to change the oozie.wf.application.path, specify
> > explicitly the jar path, remove ord add different fields in the xml,
> > put the wordcount a little bit everywhere and some other stuff but
> > nothing changed...
> > > >
> > > > I welcome any suggestion or point out any error I made.
> > > >
> > > > Many thanks.
> > > >
> > > > Axel
> > > >
> > >
> >
> >
>

Re: Spark action using python file as JAR

Reply via email to