Thanks Robert for sharing.
2013/10/22 Robert Kanter <[email protected]> > It's also to keep the Oozie server from being bogged down or becoming > unstable. For example, if you have a bunch of workflows running Pig jobs, > then you'd have the Oozie server running multiple copies of the Pig client > (which is a relatively "heavy" program) directly. By moving all of the > user code and external clients to map tasks in the launcher job, the Oozie > server remains more light-weight and less prone to errors. It can also > much more scalable this way because the launcher jobs distribute the the > job launching/monitoring to other machines in the cluster; otherwise, with > the Oozie server doing everything, we'd have to limit the number of > concurrent workflows based on your Oozie server's machine specs (RAM, CPU, > etc). And finally, from an architectural standpoint, the Oozie server > itself is stateless; that is, everything is stored in the database and the > Oozie server can be taken down at any point without losing anything. If we > were to launch jobs directly from the Oozie server, then we'd now have some > state (e.g. the Pig client cannot be restarted and resumed). > > > - Robert > > > On Tue, Oct 22, 2013 at 7:40 AM, Serega Sheypak <[email protected] > >wrote: > > > The other purpose is to have common launch mehanism for all stuff. > > > > My tipical workflow brings up to 50MB of additional jars: > > jars with 3d party pig libs for pig action, > > jars to interact with metastore via jdbc inside custom java action. > > It's good approach to put all workflow stuff into HDFS and then run it > > using oozie. Easy to install, easy to manage. > > > > Our typical analytical job runs for 20 minutes (day-window analysis) up > to > > 3 hours (3 weeks window). So nobody cares about +20 seconds spent on jar > > launcher deployment. > > Try to solve real-life problems and you'll see real problems :) > > > > > > 2013/10/22 Nam Pham <[email protected]> > > > > > The purpose is to distribute jobs to different machines instead of > > running > > > all of them on the master node, IMHO. > > > > > > Nam > > > > > > > On Oct 22, 2013, at 9:24 PM, Praveen Sripati < > [email protected] > > > > > > wrote: > > > > > > > > Thanks Serega. > > > > > > > > It might be by design, but I don't see any purpose without someone > > > telling > > > > me why. It's more of an overhead. As I mentioned I ran a work flow > with > > > > three actions and three more launcher MR jobs ran. > > > > > > > > Praveen > > > > > > > > > > > > On Tue, Oct 22, 2013 at 9:56 AM, Serega Sheypak < > > > [email protected]>wrote: > > > > > > > >> It's by design. Action is presented as map-only job with fake input. > > > Oozie > > > >> packages jar and sends it to HDFS. Then this jar is launched. > > > >> > > > >> > > > >> 2013/10/22 Praveen Sripati <[email protected]> > > > >> > > > >>> Hi, > > > >>> > > > >>> I created a simple Oozie work flow with Sqoop, Hive and Pig > actions. > > > For > > > >>> each of there actions, Oozie launches a MR launcher and which in > turn > > > >>> launches the action (Sqoop/Hive/Pig). So, there are a total of 6 MR > > > jobs > > > >>> for 3 actions in the work flow. > > > >>> > > > >>> Why does Oozie start an MR launcher to start the action and not > > > directly > > > >>> start the action? > > > >>> Thanks, > > > >>> Praveen > > > >>> > > > >> > > > > > >
