Gabriel Ibarra created KAFKA-4051:
-------------------------------------

             Summary: Strange behavior during rebalance when turning the OS 
clock back
                 Key: KAFKA-4051
                 URL: https://issues.apache.org/jira/browse/KAFKA-4051
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 0.10.0.0
         Environment: OS: Ubuntu 14.04 - 64bits

            Reporter: Gabriel Ibarra


If a rebalance is performed after turning the OS clock back, then the kafka 
server enters in a loop and the rebalance cannot be completed until the system 
returns to the previous date/hour.

Steps to Reproduce:

- Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner of 
all the partitions.
- Turn the system (OS) clock back. For instance 1 hour.
- Start a new consumer for TOPIC_NAME  using the same group id, it will force a 
rebalance.

After these actions the kafka server logs constantly display the messages 
below, and after a while both consumers do not receive more packages. This 
condition lasts at least the time that the clock went back, for this example 1 
hour, and finally after this time kafka comes back to work.

[2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
is dead and removed (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)

Due to the fact that some systems could have enabled NTP or an administrator 
option to change the system clock (date/time) it's important to do it safely, 
currently the only way to do it safely is following the next steps:

1-  Tear down the Kafka server.
2-  Change the date/time
3- Tear up the Kafka server.

But, this approach can be done only if the change was performed by the 
administrator, not for NTP. Also in many systems turning down the Kafka server 
might cause the INFORMATION TO BE LOST.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to