Do I need Kafka to have a reliable Storm spout?

Douglas Alan Tue, 24 Feb 2015 10:01:37 -0800

As I understand things, ZooKeeper will persist tuples emitted by bolts so
if a bolt crashes (or a computer with the bolt crashes, or the entire
cluster crashes), the tuple emitted by the bolt will not be lost. Once
everything is restarted, the tuples will be fetched from ZooKeeper, and
everything will continue on as if nothing bad ever happened.


What I don't yet understand is if the same thing is true for spouts. If a
spout emits a tuple (i.e., the emit() function within a spout is executed),
and the computer the spout is running on crashes shortly thereafter, will
that tuple be resurrected by ZooKeeper? Or do we need Kafka in order to
guarantee this?

|>oug

P.S. I understand that the tuple emitted by the spout must be assigned a
unique ID in the call to emit().

P.P.S. I see sample code in books that uses something like
ConcurrentHashMap<UUID, Values> to track which spouted tuples have not yet
been acked. Is this somehow automatically persisted with ZooKeeper? I
suspect not and if not, then I shouldn't really be doing that, should I?
What should I being doing instead? Using Kafka?

Do I need Kafka to have a reliable Storm spout?

Reply via email to