Thank you Ewen. This behavior is something that I'm observing. I see in the
logs continuous Connect failures to the dead broker.
The important thing here is I'm starting a brand new instance of the
Producer after a broker is down (so no prior metadata), with that down
broker also as part of the
You're just seeing that exception in the debugger, not the log, right?
ConnectException is an IOException, so it should be caught by this block
https://github.com/apache/kafka/blob/0.8.2.1/clients/src/main/java/org/apache/kafka/common/network/Selector.java#L271
, logged, and then the SelectionKey
Yes that is exactly the issue. I did not notice the close(key) is called
from the poll() method as well. I was observing this even when I run my app
(not in debug). I noticed it was taking 1sec (with a conditional debug) and
like you mentioned the default time for reconnect.backoff.ms is 10ms and
Are you seeing this in practice or is this just a concern about the way the
code currently works? If the broker is actually down and the host is
rejecting connections, the situation you describe shouldn't be a problem.
It's true that the NetworkClient chooses a fixed nodeIndexOffset, but the
If one of the broker we specify in the bootstrap servers list is down,
there is a chance that the Producer (a brand new instance with no prior
metadata) will never be able to publish anything to Kafka until that broker
is up. Because the logic for getting the initial metadata is based on some