I think this is an issue caused by KAFKA-1788.

I was trying to test producer resiliency to broker outage. In this
experiment, I shutdown all brokers and see how producer behavior.

Here are the observations
1) kafka producer can recover from kafka outage. i.e. send resumed after
brokers came back
2) producer instance saw big cpu jump during outage. 28% -> 52% in one
test.

Note that I didn't observe cpu issue when new producer instance started
with brokers outage. In this case, there are no messages accumulated in the
buffer, because KafkaProducer constructor failed with DNS lookup for
route53 name. when brokers came up, my wrapper re-created KafkaProducer
object and recover from outage with sending messages.

Here is the cpu graph for a running producer instance where broker outage
happened in the middle of test run. it shows cpu problem.
https://docs.google.com/drawings/d/1FdEg9-Rf_jbDZX0cC3iZ834c4m-5rqgK-41lSS6VudQ/edit?usp=sharing

Here is the cpu graph for a new producer instance where broker outage
happened before instance startup. cpu is good here.
https://docs.google.com/drawings/d/1NmOdwp79DKHE7kJeskBm411ln6QczAMfmcWeijvZQRQ/edit?usp=sharing

Note that producer is a 4-core m1.xlarge instance. x-axis is time, y-axis
is cpu util.

Thanks,
Steven

Reply via email to