I wonder if you have hit KAFKA-5600. Is it possible that you try out 0.11.0.1 ?
Thanks On Wed, Oct 25, 2017 at 1:15 PM, Dan Markhasin <minimi...@gmail.com> wrote: > I am using 0.11.0.0. > > There is no difference configuration-wise - both have 10 partitions and 2 > replicas. There are no errors in the logs, but looking in the data folder > it seems like Kafka is not updating the timeindex file for data1_log - > notice how the timeindex file for the current log segment is not being > updated. > > bash-4.2$ pwd > /kafka/data/data1_log-1 > bash-4.2$ ls -ltr | tail > -rw-rw-r-- 1 ibiuser it 1073731573 Oct 25 01:21 00000000000337554984.log > -rw-rw-r-- 1 ibiuser it 943616 Oct 25 01:21 00000000000337554984.index > -rw-rw-r-- 1 ibiuser it 1073734199 Oct 25 13:38 00000000000339816017.log > -rw-rw-r-- 1 ibiuser it 10485756 Oct 25 13:38 > 00000000000341934289.timeindex > -rw-rw-r-- 1 ibiuser it 10 Oct 25 13:38 > 00000000000341934289.snapshot > -rw-rw-r-- 1 ibiuser it 0 Oct 25 13:38 > 00000000000339816017.timeindex > -rw-rw-r-- 1 ibiuser it 566712 Oct 25 13:38 00000000000339816017.index > -rw-rw-r-- 1 ibiuser it 17 Oct 25 20:23 leader-epoch-checkpoint > -rw-rw-r-- 1 ibiuser it 10485760 Oct 25 23:03 00000000000341934289.index > -rw-rw-r-- 1 ibiuser it 461590419 Oct 25 23:04 00000000000341934289.log > > For comparison, the beats topic: > > bash-4.2$ cd ../beats-1 > bash-4.2$ ls -ltr > total 3212088 > -rw-rw-r-- 1 ibiuser it 17 Oct 25 00:23 leader-epoch-checkpoint > -rw-rw-r-- 1 ibiuser it 10 Oct 25 20:04 > 00000000000188672034.snapshot > -rw-rw-r-- 1 ibiuser it 2773008 Oct 25 20:04 > 00000000000185224087.timeindex > -rw-rw-r-- 1 ibiuser it 1073741779 Oct 25 20:04 00000000000185224087.log > -rw-rw-r-- 1 ibiuser it 1967440 Oct 25 20:04 00000000000185224087.index > -rw-rw-r-- 1 ibiuser it 10485760 Oct 25 23:03 00000000000188672034.index > -rw-rw-r-- 1 ibiuser it 10485756 Oct 25 23:04 > 00000000000188672034.timeindex > -rw-rw-r-- 1 ibiuser it 50166645 Oct 25 23:04 00000000000188672034.log > > > To give some context to why I'm even trying to reset the offsets, we had > encountered a strange situation earlier today: > > 1) One of the brokers had a hardware failure, and had to be rebuilt from > scratch (data partition was gone) > 2) When it went down, we noticed a spike in lag in one particular consumer > group - it seems to have reset its offset to an earlier point in time (but > not the earliest offset of the topic); I have read other messages on this > mailing list of users who experienced the same behavior with 0.11.0.0 > 3) The broker was reinstalled and rejoined the cluster with the same > broker.id (but with no data on it) - it rebalanced and eventually all > replicas became synced and the cluster was functioning normally. > 4) I then decided to bounce the same broker again to see if I can reproduce > the issue I saw in #2 - and as soon as the broker was restarted, the exact > same consumer group had its offset reset again and was lagging with > millions of records behind the current offset. > 5) I then tried to manually reset the consumer group's offset to a few > minutes before I restarted the broker, only to discover this strange > behavior where no matter which datetime value I provided, it kept resetting > to the latest offset. > > > On 25 October 2017 at 22:48, Ted Yu <yuzhih...@gmail.com> wrote: > > > Do you mind providing a bit more information ? > > > > Release of Kafka you use > > > > Any difference between data1_log and the other, normal topic ? > > > > Probably check the broker log where data1_log is hosted - see if there is > > some clue. > > > > Thanks > > > > On Wed, Oct 25, 2017 at 12:11 PM, Dan Markhasin <minimi...@gmail.com> > > wrote: > > > > > I'm trying to use the kafka-consumer-groups.sh tool in order to rewind > a > > > consumer group's offset, however it seems to be returning the latest > > offset > > > regarding of the requested offset. > > > > > > You can see in the below example that two consecutive commands to reset > > the > > > offset to a specific point in time return different (increasing) > offsets, > > > which are actually the latest offsets for the topic. > > > > > > - The consumer group ("test_consumer") is a console consumer that was > > > started with --from-beginning and terminated after a few seconds, just > > > enough for it to commit its offsets. > > > - The topic data1_log is very busy with thousands of incoming messages > > per > > > second > > > - The datetime value provided is approx. 5 hours earlier than the > current > > > UTC time > > > > > > [admin@broker01] ~> /kafka/latest/bin/kafka-consumer-groups.sh > > > --bootstrap-server localhost:9092 --reset-offsets --group test_consumer > > > --topic data1_log --to-datetime '2017-10-25T13:40:00.000' > > > Note: This will only show information about consumers that use the Java > > > consumer API (non-ZooKeeper-based consumers). > > > > > > > > > TOPIC PARTITION NEW-OFFSET > > > data1_log 8 301485420 > > > data1_log 1 342788637 > > > data1_log 7 287621428 > > > data1_log 3 268612266 > > > data1_log 0 201860717 > > > data1_log 9 202749553 > > > data1_log 4 188974032 > > > data1_log 6 234308481 > > > data1_log 2 263507741 > > > data1_log 5 232707238 > > > > > > [admin@broker01] ~> /kafka/latest/bin/kafka-consumer-groups.sh > > > --bootstrap-server localhost:9092 --reset-offsets --group test_consumer > > > --topic data1_log --to-datetime '2017-10-25T13:40:00.000' > > > Note: This will only show information about consumers that use the Java > > > consumer API (non-ZooKeeper-based consumers). > > > > > > > > > TOPIC PARTITION NEW-OFFSET > > > data1_log 8 301485491 > > > data1_log 1 342788779 > > > data1_log 7 287621534 > > > data1_log 3 268612364 > > > data1_log 0 201860796 > > > data1_log 9 202749620 > > > data1_log 4 188974068 > > > data1_log 6 234308564 > > > data1_log 2 263507823 > > > data1_log 5 232707293 > > > > > > This issue seems to be topic-specific - there is a different topic > (also > > > very active) where the same command consistently returns the correct > > > offsets fixed in the time for the requested datetime. > > > > > > What could be the issue here? > > > > > > Thanks, > > > Dan > > > > > >