Hi Robert To add to Russell's answer:
If real-time processing of events is required, you might want to use a stream-processing system like Apache S4 or Twitter's Storm. Karthik On Sun, Aug 19, 2012 at 10:27 AM, Russell Jurney <[email protected]>wrote: > The model with Hadoop would be to aggregate and write your events to > The Hadoop Distributed FileSystem, and then process them with > scheduled batch jobs via Hadoop MapReduce. If your requirements can > include some latency - then Hadoop can work for you. Depending on your > processing, you can schedule jobs down to say... every hour, half hour > or fifteen minutes? I'm not aware or anyone scheduling jobs more > frequently than that, but they may be. Chime in if you are. > > For getting events to HDFS, look at Flume, Kafka and Scribe. For > processing events, look at Pig, HIVE and Cascading. For scheduling > jobs look at Oozie and Azkaban. > > Russell Jurney http://datasyndrome.com > > On Aug 19, 2012, at 9:47 AM, Robert Nicholson > <[email protected]> wrote: > > > We have an application or a series of applications that listen to > incoming feeds they then distribute this data in XML form to a number of > queues. Another set of processes listen to these queues and process the > messages. Order of processing is important in so far as related messages > need to be processed in sequence hence today all related messages go to the > same queue and are processed by the same queue consumer. > > > > The idea would be replace the use of MQ with some kind of reliable > distributed dispatch. Does Hadoop provide that? > > > > > > >
