Hi team,

We upgraded to version 2.8 from 2.7. After monitoring for few weeks we
upgraded in our production setup (as we didn't enable Kraft we went ahead),
we faced TimeoutException in our clients after few weeks in our production
setup. We tried to list all active brokers using admin client API, all
brokers were listed properly. So we logged into that broker and tried to do
a describe topic with localhost as bootstrap-server, but we got timeout as
there.

When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
thread (zookeeper session timed out and we had three retry failures). But
the controlled shutdown got failed (got unknown server error response from
the controller), and proceeded to unclean shutdown. Still the process
didn't get quit but the process didnt process any other operation as well.
And this did not remove the broker from alive status for hours (able to see
this broker in list of brokers) and our clients were still trying to
contact this broker and failing with timeout exception. So we tried
restarting the problematic broker, but we faced unknown topic or partition
issue in our client after the restart. We noticed that metadata was not
loaded. So we had to restart our controller. And after restarting the
controller everthing got back to normal.

So how metadata loading is handled? Is there any alternative ways for us to
automate monitoring for metadata update?


Thanks in advance,
Suriya V

Reply via email to