The other purpose is to have common launch mehanism for all stuff.

My tipical workflow brings up to 50MB of additional jars:
jars with 3d party pig libs for pig action,
jars to interact with metastore via jdbc inside custom java action.
It's good approach to put all workflow stuff into HDFS and then run it
using oozie. Easy to install, easy to manage.

Our typical analytical job runs for 20 minutes (day-window analysis) up to
3 hours (3 weeks window). So nobody cares about +20 seconds spent on jar
launcher deployment.
Try to solve real-life problems and you'll see real problems :)


2013/10/22 Nam Pham <[email protected]>

> The purpose is to distribute jobs to different machines instead of running
> all of them on the master node, IMHO.
>
> Nam
>
> > On Oct 22, 2013, at 9:24 PM, Praveen Sripati <[email protected]>
> wrote:
> >
> > Thanks Serega.
> >
> > It might be by design, but I don't see any purpose without someone
> telling
> > me why. It's more of an overhead. As I mentioned I ran a work flow with
> > three actions and three more launcher MR jobs ran.
> >
> > Praveen
> >
> >
> > On Tue, Oct 22, 2013 at 9:56 AM, Serega Sheypak <
> [email protected]>wrote:
> >
> >> It's by design. Action is presented as map-only job with fake input.
> Oozie
> >> packages jar and sends it to HDFS. Then this jar is launched.
> >>
> >>
> >> 2013/10/22 Praveen Sripati <[email protected]>
> >>
> >>> Hi,
> >>>
> >>> I created a simple Oozie work flow with Sqoop, Hive and Pig actions.
> For
> >>> each of there actions, Oozie launches a MR launcher and which in turn
> >>> launches the action (Sqoop/Hive/Pig). So, there are a total of 6 MR
> jobs
> >>> for 3 actions in the work flow.
> >>>
> >>> Why does Oozie start an MR launcher to start the action and not
> directly
> >>> start the action?
> >>> Thanks,
> >>> Praveen
> >>>
> >>
>

Reply via email to