I have run into the issue documented by KAFKA-1387 and have been trying to come 
up with a solution.  A summary of this issue is:

 *   Kafka registers brokers and consumers via ephemeral ZooKeeper nodes.
 *   When a connection fails and an Expire event is received Kafka reconnects 
and then attempts to recreate these ephemeral nodes.
 *   If the node still/already exists when the Kafka attempts to recreate it 
Kafka currently assumes (I am working with 0.8.1.1) that ZooKeeper is just slow 
deleting the node from the old session and therefore goes into a delay loop 
waiting for ZooKeeper to remove the stale node so it can create a new ephemeral 
associated with the new session.
 *   In my stress testing I have seen cases where the connection can fail 
multiple times in a short period of time and if one of these failures occurs 
while handling the Expire event Kafka can end up with a backlog of two or more 
Expire events.  When the first of these finally gets processed it recreates the 
node against the latest session.  However, when the next one is processed the 
Kafka broker or consumer goes into a never ending delay loop waiting for the 
stable node to go away.  This will not happen unless the connection fails 
again, but, then the process just repeats itself.

I proposed a fix in the KAFKA-1387 Jira issue to generate some discussion of 
potential fixes for this issue. One of the Kafka developers requested that I 
vet the basic assumption of my fix with the ZooKeeper team.  My solution is 
basically:

 *   Register (via ZkClient) for notifications of both session and node events
 *   When processing the Expire event:
    *   If the node does not exist then recreate the node (current behavior)
    *   If the node exists do nothing (no looping)
 *   When processing a delete node event:
    *   If the node does not exist then recreate the node (new behavior)
    *   If the node exists do nothing

The basic assumption is that:

   "In the rare case where the node still exists from the previous session when the 
Expire message is processed then we can be confident that we will be notified later when 
the node is finally deleted."

In my testing I have seen:

 *   If I recreate the node while handling the Expire I do not later get a 
delete message (for the already deleted node).
 *   If I do nothing when I process the Expire (to partially simulate a slow 
ZooKeeper delete) then I do get a delete message for the old node (which was 
actually deleted before I processed the Expire message).

I would greatly appreciate your insights on this issue.  For more details you 
can see the Kafka issue.

James Lent
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com/>
[email protected]<mailto:[email protected]>  | office 919.460.4747

________________________________

This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Reply via email to