I have run into the issue documented by KAFKA-1387 and have been trying to come
up with a solution. A summary of this issue is:
* Kafka registers brokers and consumers via ephemeral ZooKeeper nodes.
* When a connection fails and an Expire event is received Kafka reconnects
and then attempts to recreate these ephemeral nodes.
* If the node still/already exists when the Kafka attempts to recreate it
Kafka currently assumes (I am working with 0.8.1.1) that ZooKeeper is just slow
deleting the node from the old session and therefore goes into a delay loop
waiting for ZooKeeper to remove the stale node so it can create a new ephemeral
associated with the new session.
* In my stress testing I have seen cases where the connection can fail
multiple times in a short period of time and if one of these failures occurs
while handling the Expire event Kafka can end up with a backlog of two or more
Expire events. When the first of these finally gets processed it recreates the
node against the latest session. However, when the next one is processed the
Kafka broker or consumer goes into a never ending delay loop waiting for the
stable node to go away. This will not happen unless the connection fails
again, but, then the process just repeats itself.
I proposed a fix in the KAFKA-1387 Jira issue to generate some discussion of
potential fixes for this issue. One of the Kafka developers requested that I
vet the basic assumption of my fix with the ZooKeeper team. My solution is
basically:
* Register (via ZkClient) for notifications of both session and node events
* When processing the Expire event:
* If the node does not exist then recreate the node (current behavior)
* If the node exists do nothing (no looping)
* When processing a delete node event:
* If the node does not exist then recreate the node (new behavior)
* If the node exists do nothing
The basic assumption is that:
"In the rare case where the node still exists from the previous session when the
Expire message is processed then we can be confident that we will be notified later when
the node is finally deleted."
In my testing I have seen:
* If I recreate the node while handling the Expire I do not later get a
delete message (for the already deleted node).
* If I do nothing when I process the Expire (to partially simulate a slow
ZooKeeper delete) then I do get a delete message for the old node (which was
actually deleted before I processed the Expire message).
I would greatly appreciate your insights on this issue. For more details you
can see the Kafka issue.
James Lent
Senior Software Engineer
Digitalsmiths
A TiVo Company
www.digitalsmiths.com<http://www.digitalsmiths.com/>
[email protected]<mailto:[email protected]> | office 919.460.4747
________________________________
This email and any attachments may contain confidential and privileged material
for the sole use of the intended recipient. Any review, copying, or
distribution of this email (or any attachments) by others is prohibited. If you
are not the intended recipient, please contact the sender immediately and
permanently delete this email and any attachments. No employee or agent of TiVo
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by
email. Binding agreements with TiVo Inc. may only be made by a signed written
agreement.