Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has a DAG GUI.
On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <[email protected]> wrote: > Last checkin on Azkaban was 11 months ago: > > https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a > > But, the last checkin for Hamake was June 2010. And it's still a cool > little Hadoop/Pig scheduler. > http://hamake.googlecode.com/ > > On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel > <[email protected]> wrote: > > There has been some work to replace the use of queues with HBase. > > This would be used to feed processes off the queue to help balance out > the load on the cluster. > > > > In one specific use case, this was effective because the time spent > processing each mapper.map() iteration is a couple of orders of magnitude > as the time it takes to pull the data from the 'queue' and to each node for > processing. > > > > Again, YMMV, it is an interesting hack though.... > > > > On Aug 19, 2012, at 11:46 AM, Robert Nicholson < > [email protected]> wrote: > > > >> We have an application or a series of applications that listen to > incoming feeds they then distribute this data in XML form to a number of > queues. Another set of processes listen to these queues and process the > messages. Order of processing is important in so far as related messages > need to be processed in sequence hence today all related messages go to the > same queue and are processed by the same queue consumer. > >> > >> The idea would be replace the use of MQ with some kind of reliable > distributed dispatch. Does Hadoop provide that? > >> > >> > >> > >> > > > > > > -- > Lance Norskog > [email protected] > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
