I just want to say that I have solved the situation by deleting zookeeper's and kafka's data directories and setting offsets.topic.replication.factor=3 in kafka server.properties file.

After that, __consumer_offsets topic is replicated and everything works as expected.


I hope this will help to someone.


Regards.


On 01/30/2018 03:02 PM, Zoran wrote:
Sorry, I have attached wrong server.properties file. Now the right one is in the attachment.

Regards.


On 01/30/2018 02:59 PM, Zoran wrote:
Hi,

I have three servers:

blade1 (192.168.112.31),
blade2 (192.168.112.32) and
blade3 (192.168.112.33).

On each of servers kafka_2.11-1.0.0 is installed.
On blade3 (192.168.112.33:2181) zookeeper is installed as well.

I have created a topic repl3part5 with the following line:

bin/kafka-topics.sh --zookeeper 192.168.112.33:2181 --create --replication-factor 3 --partitions 5 --topic repl3part5

When I describe the topic, it looks like this:

[root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 --zookeeper 192.168.112.33:2181

Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
    Topic: repl3part5    Partition: 0    Leader: 2    Replicas: 2,3,1    Isr: 2,3,1     Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 3,1,2    Isr: 3,1,2     Topic: repl3part5    Partition: 2    Leader: 1    Replicas: 1,2,3    Isr: 1,2,3     Topic: repl3part5    Partition: 3    Leader: 2    Replicas: 2,1,3    Isr: 2,1,3     Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 3,2,1    Isr: 3,2,1

I have a producer for this topic:

bin/kafka-console-producer.sh --broker-list 192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5

and single consumer:

bin/kafka-console-consumer.sh --bootstrap-server 192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5  --consumer-property group.id=zoran_1

Every message that is sent by producer gets collected by consumer. So far - so good.

Now I would like to test fail over of the kafka servers. If I put down blade 3 kafka service, I get consumer warnings but all produced messages are still consumed.

[2018-01-30 14:30:01,203] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 3 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:30:01,299] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 3 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:30:01,475] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 3 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

Now I have started up kafka service on blade 3 and I have put down kafka service on blade 2 server. Consumer now showed one warning but all produced messages are still consumed.

[2018-01-30 14:31:38,164] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

Now I have started up kafka service on blade 2 and I have put down kafka service on blade 1 server.

Consumer now shows warnings about node 1/2147483646, but also Asynchronous auto-commit of offsets ... failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: null.

[2018-01-30 14:33:16,393] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:16,469] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:16,557] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:16,986] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:16,991] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:17,493] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:17,495] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:18,002] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:18,003] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: null (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-01-30 14:33:18,611] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:18,932] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:18,933] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: null (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-01-30 14:33:19,977] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2018-01-30 14:33:19,978] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: null (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-01-30 14:33:19,979] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

I tried to solve the problem by adding a offsets.topic.replication.factor=2 (or 3) on all three server.properties file (one of them is attached), but with no success. My idea was that topic __consumer_offset wasn't replicated throughout the cluster, but looks like it is not the case here.

While blade 1 kafka service was down topic describe showed the following:

[root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 --zookeeper 192.168.112.33:2181

Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
    Topic: repl3part5    Partition: 0    Leader: 3    Replicas: 2,3,1    Isr: 3     Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 3,1,2    Isr: 3     Topic: repl3part5    Partition: 2    Leader: 3    Replicas: 1,2,3    Isr: 3     Topic: repl3part5    Partition: 3    Leader: 3    Replicas: 2,1,3    Isr: 3     Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 3,2,1    Isr: 3

Producer now shows the following warning, it still puts messages on the topic but messages are just raising lag count on partitions:

[2018-01-30 14:37:21,816] WARN [Producer clientId=console-producer] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

I noticed that while kafka service on blade1 is alive, I can put down/up blade 2 and 3 in any combination and consumer will always be able to consume messages. If kafka service on blade 1 is down, than even if kafka services on blade 2 and blade 3 are up and running, consumer cannot consume messages.

After bringing kafka service up on blade 1, all messages that producer has sent while kafka service on blade 1 was down are replayed and than the following is showed in consumer terminal:

[2018-01-30 14:44:30,817] ERROR [Consumer clientId=consumer-1, groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset 20: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-01-30 14:44:30,817] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=22, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-01-30 14:44:31,202] ERROR [Consumer clientId=consumer-1, groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset 22: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2018-01-30 14:44:31,202] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=24, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=24, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=24, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

From now on everything works with no problems or warnings and the system is fully functional.

Can someone explain to me why kafka server on blade 1 is so important, and what are my options in order to be able to stop any of the two servers (including kafka server on blade 1) and be able to consume messages with no delay?
This thing drives me crazy. :)

Can you please help?

Regards.


Reply via email to