I think this is an issue caused by KAFKA-1788. I was trying to test producer resiliency to broker outage. In this experiment, I shutdown all brokers and see how producer behavior.
Here are the observations 1) kafka producer can recover from kafka outage. i.e. send resumed after brokers came back 2) producer instance saw big cpu jump during outage. 28% -> 52% in one test. Note that I didn't observe cpu issue when new producer instance started with brokers outage. In this case, there are no messages accumulated in the buffer, because KafkaProducer constructor failed with DNS lookup for route53 name. when brokers came up, my wrapper re-created KafkaProducer object and recover from outage with sending messages. Here is the cpu graph for a running producer instance where broker outage happened in the middle of test run. it shows cpu problem. https://docs.google.com/drawings/d/1FdEg9-Rf_jbDZX0cC3iZ834c4m-5rqgK-41lSS6VudQ/edit?usp=sharing Here is the cpu graph for a new producer instance where broker outage happened before instance startup. cpu is good here. https://docs.google.com/drawings/d/1NmOdwp79DKHE7kJeskBm411ln6QczAMfmcWeijvZQRQ/edit?usp=sharing Note that producer is a 4-core m1.xlarge instance. x-axis is time, y-axis is cpu util. Thanks, Steven