Using normal storm, any bolt can output to anything at any time, as each bolt 
runs arbitrary code. So a bolt in the middle of a topology can write to a 
database, or file, or anything else you need. It will likely be the last bolt 
in the topology, but it doesn't have to be.

If you use trident, then you use specific abstractions to read and write data - 
to read, you use a StateFactory and a QueryFunction, and to write, you use a 
StateFactory with a StateUpdater.

If you want to read data from flume, you'll have to write a spout to pull data 
from flume and emit it into a topology. Start with the IRichSpout interface for 
normal storm, or ITridentSpout for trident.

SimonC

From: P lva [mailto:[email protected]]
Sent: 26 February 2014 02:44
To: [email protected]
Subject: Storm Applications

Hello Everyone,

I came across storm recently and I'm trying to understand it better.

Storm, unlike flume, doesn't really have any code for a sink. Read somewhere 
that storm is a real time stream processing engine where you don't expect data 
to land anywhere. What kind of a situation would this be ?

One example I envision is a situation where you only want to maintain counters 
without the actual data itself. Is this right ? If yes, I'm assuming that these 
counters have to be updated in a database. How does this affect the performance 
?

Can I route flume streams through storm cluster to compute the counters,store 
the counters in hbase (instead of going flume ---> hive .---> top 10 query), 
effectively decreasing the number of mapreduce jobs on hadoop cluster ?





Reply via email to