Hello,

I'm trying to run single Kafka broker, with few topics. Basically 1
broker, 1 partition per topic, 1 replica, few topics. I've been using
spotify/kafka dockerhub image which apparently just downloads Kafka
release (0.8.2.1 in my case) and start it with default config +
advertised host settings added.

When I start Kafka like this it works fine, for a number of days.
Occasionally, and seemingly random, it however enters some state where
my clients are receiving LeaderNotAvailable exception, for all topics.

Once Kafka server enters this state, I didn't found any way to get it
back to healthy state. If I restart the server, it immediately works
fine again, for few days. This is identical whether running on my
development laptop or on Amazon's ECS service. I have feeling, that is
happens often on my laptop when I put it to sleep (so virtualbox and
docker inside might be affected somehow), but over past few weekssuch
failure didnt happened, despite of daily usage and laptop sleeping.

I googled a bit, it seems to happen when Kafka can't access self through
the address specified in advertised host. I've verified that the host is
availbale (i.e. I can connect to self using those settings), all
dns/networking/etc seem to work fine. Like, I can docker exec to the
docker container, and with telnet access zookeeper's 2181 or Kafka's
9092 ports, using the addresses from server.properties file.

I also tried to run kafka-preferred-replica-election, which succeeds on
first try and says that election process has started for all topics.
But, thatprocess apparently does continue indefinitely, so subsequent
executions of that command abort due to running election process.

I've checked all the logs from Kafka and Zookeeper, nothing alarming
there, either.

Any idea where could I dig next?How to troubleshoot it when it will
happens? What to check/execute?

PS. While I consider myself to be relatively strong in devops area, my
experience with Kafka is very minimal, soplease comment even on most
novice details, as I'm likely to miss them.

-- 
Best regards from
Kamil Burzynski

Reply via email to