I have been running crunch/cascading jobs as oozie java actions, no
problems so far.


On Tue, Oct 8, 2013 at 4:47 PM, Alejandro Abdelnur <[email protected]>wrote:

> I would suggest looking at how Pig/Hive/Sqoop/Distcp actions works if you
> want to have a custom <cascading> action. Which, BTW, it would be a great
> contribution to Oozie.
>
> If you are going this path, you'll have to write a CascadingActionExecutor
> class that runs in the Oozie server and you'll e corhave to write a
> CascadingMain class that runs in the Launcher job. Plus an XSD defining the
> cascading XML syntax.
>
> If you want to start simpler, you can could do it via the Java action. You
> will only need the CascadingMain for this. You can cannibalize
> Pig/Hive/Sqoop/Distcp Oozie main class for this. The most important thing
> here is to ensure the tokens are propagated to the cascading MR jobs.
>
> hope this helps.
>
>
> On Tue, Oct 8, 2013 at 3:19 PM, <[email protected]> wrote:
>
> > Follow up.  I've tried to run a Cascading job in oozie a couple of ways,
> > but they all fail for various reasons.
> >
> > I tried to put it in a map-reduce action with
> > oozie.launcher.action.main.class defined pointing to my Cascading class,
> > but I can't see any way to pass all the arguments to it that it needs.
> >
> > I also tried to use a shell action using
> oozie.launcher.action.main.class.
> >  That launches my class but doesn't pass any arguments to it even though
> I
> > specified arguments in the shell action.
> >
> > Finally, I tried to do it with a shell command where I don't specify
> > oozie.launcher.action.main.class and instead put '/usr/bin/hadoop' as the
> > exec action and then put all the rest of the invocation as commands.
>  This
> > invokes my Cascading class with the right arguments, but then dies for no
> > apparent reason that I can tell from the Hadoop logs (it never launches
> the
> > Cascading MR jobs).
> >
> > If anyone has an example of a working oozie workflow where they wrap a
> > Cascading job, I'd love to see it.
> >
> > -Michael
> >
> >
> >
> > On Tue, Oct 8, 2013 at 4:08 PM, <[email protected]> wrote:
> >
> > > Apologies if this has been asked before, but I can't figure out how to
> > > search the archives of this mailing list and 20 minutes of googling
> > yielded
> > > no useful results.
> > >
> > > I'm on a team that uses Cascading to do our MapReduce flows.  However,
> we
> > > are investigating using Oozie to do additional types of actions (hive,
> > > shell, etc.) and use its scheduler.  For this to work, we'll need to be
> > > able to run a Cascading job as an oozie action.  Which is what I can't
> > > figure out how to do.
> > >
> > > Typically to run a Cascading job, we'll do this:
> > >
> > > hadoop jar mycascading_uberjar.jar com.company.MyCascadingFlow arg1
> arg2
> > > arg3 argN
> > >
> > > My first thought was to use an oozie map-reduce action, since I run
> this
> > > with "hadoop jar" and Cascading creates MRs under the hood, but the
> oozie
> > > map-reduce action wants things like mapred.mapper.class
> > > and mapred.reducer.class.  Well MyCascadingFlow runs two dozen
> different
> > > mappers and a few different reducers!
> > >
> > > What is the best way to do this?  The java action seems wrong since it
> > > won't run it with "hadoop jar".  Which leaves me with just a shell
> action
> > > and putting the "hadoop jar ...." line in a shell script and invoking
> it.
> > >
> > > Other ideas?
> > >
> > > -Michael
> > >
> >
>
>
>
> --
> Alejandro
>

Reply via email to