Thanks Dmitriy, exactly the information I was looking for. On Tue, Jan 11, 2011 at 1:40 PM, Dmitriy Ryaboy <[email protected]> wrote:
> Soren, > The "real" answer is probably to use Oozie under the covers in order to > handle all kinds of edge conditions w.r.t cluster availability, job > configuration, etc. > > If you don't want to deal with Oozie or Azkaban, you could do something > like > the following: > > 1) web app that works with a simple "pig job" model. The pig job model > specifies the script, parameters, status (submitted / running / done / > killed / died), and a few timestamps as needed. > > 2) a daemon process that monitors the table for new jobs and starts them on > the cluster, updating the table appropriately. You can add all the resource > constraints, access restrictions, etc here. > > The more you develop the daemon and the web app (how about monitoring the > Pig job through the new PigStats?...), the more you will realize you are > rebuilding Oozie and start thinking about how to integrate it. But if you > need something to work by the end of the week, a quickly rolled daemon + > rails app is probably faster to set up in the short term. > > D > > On Tue, Jan 11, 2011 at 1:34 PM, [email protected] <[email protected] > >wrote: > > > Thanks Jeff. I am aware of the Java API, I was hoping to hear from people > > who might already be doing this and learn from their own experiences > before > > I go down any one particular path. > > > > On Mon, Jan 10, 2011 at 8:39 PM, Jeff Zhang <[email protected]> wrote: > > > > > You can use Java API of Pig. Regarding the priority, you can let user > > > choose > > > the priority on web page. And you can use other scheduler rather the > > > default > > > FIFO of hadoop > > > > > > On Tue, Jan 11, 2011 at 8:37 AM, Charles Gonçalves < > [email protected] > > > >wrote: > > > > > > > I reinforce the interest in this topic. > > > > I'll soon need to create a web interface for my marketers colleagues > > ... > > > > > > > > On Mon, Jan 10, 2011 at 10:06 PM, [email protected] < > > [email protected] > > > > >wrote: > > > > > > > > > I'd be interested to hear people's experience / best practices for > > > > running > > > > > pig scripts on demand from a web app. What do you use as the > calling > > > > > mechanism? how to you handle priority / scheduling for ad-hoc or > user > > > > > generated tasks? > > > > > > > > > > Best, > > > > > Soren > > > > > > > > > > -- > > > > > http://about.me/soren/bio > > > > > > > > > > > > > > > > > > > > > -- > > > > *Charles Ferreira Gonçalves * > > > > http://homepages.dcc.ufmg.br/~charles/ > > > > UFMG - ICEx - Dcc > > > > Cel.: 55 31 87741485 > > > > Tel.: 55 31 34741485 > > > > Lab.: 55 31 34095840 > > > > > > > > > > > > > > > > -- > > > Best Regards > > > > > > Jeff Zhang > > > > > > > > > > > -- > > http://about.me/soren/bio > > > -- http://about.me/soren/bio
