Thanks Robert for sharing.

2013/10/22 Robert Kanter <[email protected]>

> It's also to keep the Oozie server from being bogged down or becoming
> unstable.  For example, if you have a bunch of workflows running Pig jobs,
> then you'd have the Oozie server running multiple copies of the Pig client
> (which is a relatively "heavy" program) directly.  By moving all of the
> user code and external clients to map tasks in the launcher job, the Oozie
> server remains more light-weight and less prone to errors.  It can also
> much more scalable this way because the launcher jobs distribute the the
> job launching/monitoring to other machines in the cluster; otherwise, with
> the Oozie server doing everything, we'd have to limit the number of
> concurrent workflows based on your Oozie server's machine specs (RAM, CPU,
> etc).  And finally, from an architectural standpoint, the Oozie server
> itself is stateless; that is, everything is stored in the database and the
> Oozie server can be taken down at any point without losing anything.  If we
> were to launch jobs directly from the Oozie server, then we'd now have some
> state (e.g. the Pig client cannot be restarted and resumed).
>
>
> - Robert
>
>
> On Tue, Oct 22, 2013 at 7:40 AM, Serega Sheypak <[email protected]
> >wrote:
>
> > The other purpose is to have common launch mehanism for all stuff.
> >
> > My tipical workflow brings up to 50MB of additional jars:
> > jars with 3d party pig libs for pig action,
> > jars to interact with metastore via jdbc inside custom java action.
> > It's good approach to put all workflow stuff into HDFS and then run it
> > using oozie. Easy to install, easy to manage.
> >
> > Our typical analytical job runs for 20 minutes (day-window analysis) up
> to
> > 3 hours (3 weeks window). So nobody cares about +20 seconds spent on jar
> > launcher deployment.
> > Try to solve real-life problems and you'll see real problems :)
> >
> >
> > 2013/10/22 Nam Pham <[email protected]>
> >
> > > The purpose is to distribute jobs to different machines instead of
> > running
> > > all of them on the master node, IMHO.
> > >
> > > Nam
> > >
> > > > On Oct 22, 2013, at 9:24 PM, Praveen Sripati <
> [email protected]
> > >
> > > wrote:
> > > >
> > > > Thanks Serega.
> > > >
> > > > It might be by design, but I don't see any purpose without someone
> > > telling
> > > > me why. It's more of an overhead. As I mentioned I ran a work flow
> with
> > > > three actions and three more launcher MR jobs ran.
> > > >
> > > > Praveen
> > > >
> > > >
> > > > On Tue, Oct 22, 2013 at 9:56 AM, Serega Sheypak <
> > > [email protected]>wrote:
> > > >
> > > >> It's by design. Action is presented as map-only job with fake input.
> > > Oozie
> > > >> packages jar and sends it to HDFS. Then this jar is launched.
> > > >>
> > > >>
> > > >> 2013/10/22 Praveen Sripati <[email protected]>
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> I created a simple Oozie work flow with Sqoop, Hive and Pig
> actions.
> > > For
> > > >>> each of there actions, Oozie launches a MR launcher and which in
> turn
> > > >>> launches the action (Sqoop/Hive/Pig). So, there are a total of 6 MR
> > > jobs
> > > >>> for 3 actions in the work flow.
> > > >>>
> > > >>> Why does Oozie start an MR launcher to start the action and not
> > > directly
> > > >>> start the action?
> > > >>> Thanks,
> > > >>> Praveen
> > > >>>
> > > >>
> > >
> >
>

Reply via email to