Ah so Storm is the hospital and Kafka is the waiting room where everybody 
queues up to be seen in turn yes?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Justin Workman 
Sent: Thursday, August 14, 2014 7:47 PM
To: [email protected] 
Subject: Re: Kafka + Storm

If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see if I 
can explain, I am definitely not a subject matter expert on this. 

Within Kafka you can create "queues", ie a webclicks queue. Your web servers 
can then send click events to this queue in Kafka. The web servers, or agent 
writing the events to this queue are referred to as the "producer".  Each 
event, or message in Kafka is assigned an id. 

On the other side there are "consumers", in storms case this would be the storm 
Kafka spout, that can subscribe to this webclicks queue to consume the messages 
that are in the queue. The consumer can consume a single message from the 
queue, or a batch of messages, as storm does. The consumer keeps track of the 
latest offset, Kafka message id, that it has consumed. This way the next time 
the consumer checks to see if there are more messages to consume it will ask 
for messages with a message id greater than its last offset. 

This helps with the reliability of the event stream and helps guarantee that 
your events/message make it start to finish through your stream, assuming the 
events get to Kafka ;)

Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)

Justin

Sent from my iPhone

On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" 
<[email protected]> wrote:


  I get your reasoning at a high level. I should have specified that I wasn’t 
sure what Kafka does. I don’t have a hard software engineering background. I 
know that Kafka is “a message queuing” system, but I don’t really know what 
that means.

  (I can’t believe you wrote all that from your iPhone....)
  B.


  From: Justin Workman 
  Sent: Thursday, August 14, 2014 7:22 PM
  To: [email protected] 
  Subject: Re: Kafka + Storm

  Personally, we looked at several options, including writing our own storm 
source. There are limited storm sources with community support out there. For 
us, it boiled down to the following;

  1) community support and what appeared to be a standard method. Storm has now 
included the kafka source as a bundled component to storm. This made the 
implementation much faster, because the code was done. 
  2) the durability (replication and clustering) of Kafka. We have a three hour 
retention period on our queues, so if we need to do maintenance on storm or 
deploy an updated topology, we don't need to stop or replay any sources
  3) the ability to have other tools attach to the Kafka queues to consume the 
same events for other purposes. 
  4) to compliment point #1, it's easy to write to Kafka. So it was little 
effort to start sending our desired data to Kafka. 

  These are our main reasons ( I'm sure there were more ). Each use case is 
going to be different and Kafka might not be the best choice for everyone. For 
us it made sense. 

  Justin 

  Sent from my iPhone

  On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" 
<[email protected]> wrote:


    Can someone tell me why people put Kafka in front of Storm? Can’t Storm 
ingest messages without having Kafka in the middle?

    B.

Reply via email to