There is another much more active fork of Azkaban. See https://github.com/rbpark/azkaban
On Sun, Aug 19, 2012 at 6:57 PM, Lance Norskog <[email protected]> wrote: > Cool. I'm on the sidelines of a project trying to use Oozie in a large > Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'. > > On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney > <[email protected]> wrote: > > Glad to hear about Hamake. FWIW, I've had good success with Azkaban in > the > > past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even > has a > > DAG GUI. > > > > > > On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <[email protected]> > wrote: > >> > >> Last checkin on Azkaban was 11 months ago: > >> > >> > https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a > >> > >> But, the last checkin for Hamake was June 2010. And it's still a cool > >> little Hadoop/Pig scheduler. > >> http://hamake.googlecode.com/ > >> > >> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel > >> <[email protected]> wrote: > >> > There has been some work to replace the use of queues with HBase. > >> > This would be used to feed processes off the queue to help balance out > >> > the load on the cluster. > >> > > >> > In one specific use case, this was effective because the time spent > >> > processing each mapper.map() iteration is a couple of orders of > magnitude as > >> > the time it takes to pull the data from the 'queue' and to each node > for > >> > processing. > >> > > >> > Again, YMMV, it is an interesting hack though.... > >> > > >> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson > >> > <[email protected]> wrote: > >> > > >> >> We have an application or a series of applications that listen to > >> >> incoming feeds they then distribute this data in XML form to a > number of > >> >> queues. Another set of processes listen to these queues and process > the > >> >> messages. Order of processing is important in so far as related > messages > >> >> need to be processed in sequence hence today all related messages go > to the > >> >> same queue and are processed by the same queue consumer. > >> >> > >> >> The idea would be replace the use of MQ with some kind of reliable > >> >> distributed dispatch. Does Hadoop provide that? > >> >> > >> >> > >> >> > >> >> > >> > > >> > >> > >> > >> -- > >> Lance Norskog > >> [email protected] > > > > > > > > > > -- > > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com > > > > -- > Lance Norskog > [email protected] >
