We are in final testing of Kafka and so far the fail-over tests have been 
pretty encouraging.  If we kill (-9) one of two kafka brokers, with replication 
factor=2 we see a flurry of activity as the producer fails and retries its 
writes (we use a bulk, synchronous send of 1000 messages at a time, each 
message ~1K long).  Sometimes the library finds the newly elected leader before 
returning to the application and sometimes it doesn't.  We added retry/backoff 
logic to our code and we don't seem to be losing content.

However, we have another app in the pipeline that does a fan-out from one Kafka 
topic to dozens of topics.  We still use a single, synchronous, bulk send.

My question is what are the semantics of a bulk send like that, where one 
broker dies, but the topic leaders have been spread across both brokers.  Do we 
get any feedback on which messages went through and which were dropped because 
the leader just died?  For our own transactioning we can mark messages as 
'retries' if we suspect there might have been any hanky-panky, but if we can 
reliably avoid extra work by not re-sending messages that we know have been 
delivered we can avoid the extra work on the client side.

Thanks for any insight,

Bob Jervis | Senior Architect

[cid:[email protected]]<http://www.visibletechnologies.com/>
Seattle | Boston | New York | London
Phone: 425.957.6075 | Fax: 781.404.5711

Follow Visibly Intelligent Blog<http://www.visibletechnologies.com/blog/>

[cid:[email protected]]<http://twitter.com/visible>[cid:[email protected]]<http://www.facebook.com/Visible.Technologies>
 [cid:[email protected]] 
<http://www.linkedin.com/company/visible-technologies>

Reply via email to