Thanks Dmitriy, exactly the information I was looking for.

On Tue, Jan 11, 2011 at 1:40 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Soren,
> The "real" answer is probably to use Oozie under the covers in order to
> handle all kinds of edge conditions w.r.t cluster availability, job
> configuration, etc.
>
> If you don't want to deal with Oozie or Azkaban, you could do something
> like
> the following:
>
> 1) web app that works with a simple "pig job" model. The pig job model
> specifies the script, parameters, status (submitted / running / done /
> killed / died), and a few timestamps as needed.
>
> 2) a daemon process that monitors the table for new jobs and starts them on
> the cluster, updating the table appropriately. You can add all the resource
> constraints, access restrictions, etc here.
>
> The more you develop the daemon and the web app (how about monitoring the
> Pig job through the new PigStats?...), the more you will realize you are
> rebuilding Oozie and start thinking about how to integrate it. But if you
> need something to work by the end of the week, a quickly rolled daemon +
> rails app is probably faster to set up in the short term.
>
> D
>
> On Tue, Jan 11, 2011 at 1:34 PM, [email protected] <[email protected]
> >wrote:
>
> > Thanks Jeff. I am aware of the Java API, I was hoping to hear from people
> > who might already be doing this and learn from their own experiences
> before
> > I go down any one particular path.
> >
> > On Mon, Jan 10, 2011 at 8:39 PM, Jeff Zhang <[email protected]> wrote:
> >
> > > You can use Java API of Pig. Regarding the priority, you can let user
> > > choose
> > > the priority on web page. And you can use other scheduler rather the
> > > default
> > > FIFO of hadoop
> > >
> > > On Tue, Jan 11, 2011 at 8:37 AM, Charles Gonçalves <
> [email protected]
> > > >wrote:
> > >
> > > > I reinforce the interest in this topic.
> > > > I'll soon need to create a web interface for my marketers colleagues
> > ...
> > > >
> > > > On Mon, Jan 10, 2011 at 10:06 PM, [email protected] <
> > [email protected]
> > > > >wrote:
> > > >
> > > > > I'd be interested to hear people's experience / best practices for
> > > > running
> > > > > pig scripts on demand from a web app. What do you use as the
> calling
> > > > > mechanism? how to you handle priority / scheduling for ad-hoc or
> user
> > > > > generated tasks?
> > > > >
> > > > > Best,
> > > > > Soren
> > > > >
> > > > > --
> > > > > http://about.me/soren/bio
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *Charles Ferreira Gonçalves *
> > > > http://homepages.dcc.ufmg.br/~charles/
> > > > UFMG - ICEx - Dcc
> > > > Cel.: 55 31 87741485
> > > > Tel.:  55 31 34741485
> > > > Lab.: 55 31 34095840
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
> >
> >
> > --
> > http://about.me/soren/bio
> >
>



-- 
http://about.me/soren/bio

Reply via email to