I have been running crunch/cascading jobs as oozie java actions, no problems so far.
On Tue, Oct 8, 2013 at 4:47 PM, Alejandro Abdelnur <[email protected]>wrote: > I would suggest looking at how Pig/Hive/Sqoop/Distcp actions works if you > want to have a custom <cascading> action. Which, BTW, it would be a great > contribution to Oozie. > > If you are going this path, you'll have to write a CascadingActionExecutor > class that runs in the Oozie server and you'll e corhave to write a > CascadingMain class that runs in the Launcher job. Plus an XSD defining the > cascading XML syntax. > > If you want to start simpler, you can could do it via the Java action. You > will only need the CascadingMain for this. You can cannibalize > Pig/Hive/Sqoop/Distcp Oozie main class for this. The most important thing > here is to ensure the tokens are propagated to the cascading MR jobs. > > hope this helps. > > > On Tue, Oct 8, 2013 at 3:19 PM, <[email protected]> wrote: > > > Follow up. I've tried to run a Cascading job in oozie a couple of ways, > > but they all fail for various reasons. > > > > I tried to put it in a map-reduce action with > > oozie.launcher.action.main.class defined pointing to my Cascading class, > > but I can't see any way to pass all the arguments to it that it needs. > > > > I also tried to use a shell action using > oozie.launcher.action.main.class. > > That launches my class but doesn't pass any arguments to it even though > I > > specified arguments in the shell action. > > > > Finally, I tried to do it with a shell command where I don't specify > > oozie.launcher.action.main.class and instead put '/usr/bin/hadoop' as the > > exec action and then put all the rest of the invocation as commands. > This > > invokes my Cascading class with the right arguments, but then dies for no > > apparent reason that I can tell from the Hadoop logs (it never launches > the > > Cascading MR jobs). > > > > If anyone has an example of a working oozie workflow where they wrap a > > Cascading job, I'd love to see it. > > > > -Michael > > > > > > > > On Tue, Oct 8, 2013 at 4:08 PM, <[email protected]> wrote: > > > > > Apologies if this has been asked before, but I can't figure out how to > > > search the archives of this mailing list and 20 minutes of googling > > yielded > > > no useful results. > > > > > > I'm on a team that uses Cascading to do our MapReduce flows. However, > we > > > are investigating using Oozie to do additional types of actions (hive, > > > shell, etc.) and use its scheduler. For this to work, we'll need to be > > > able to run a Cascading job as an oozie action. Which is what I can't > > > figure out how to do. > > > > > > Typically to run a Cascading job, we'll do this: > > > > > > hadoop jar mycascading_uberjar.jar com.company.MyCascadingFlow arg1 > arg2 > > > arg3 argN > > > > > > My first thought was to use an oozie map-reduce action, since I run > this > > > with "hadoop jar" and Cascading creates MRs under the hood, but the > oozie > > > map-reduce action wants things like mapred.mapper.class > > > and mapred.reducer.class. Well MyCascadingFlow runs two dozen > different > > > mappers and a few different reducers! > > > > > > What is the best way to do this? The java action seems wrong since it > > > won't run it with "hadoop jar". Which leaves me with just a shell > action > > > and putting the "hadoop jar ...." line in a shell script and invoking > it. > > > > > > Other ideas? > > > > > > -Michael > > > > > > > > > -- > Alejandro >
