After some more investigation, I've been able to get the expected behavior by removing the null check here: https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220
Hopefully someone more familiar with the code can comment, but that statement does appear to be preventing the correct behavior. Thanks, Luke On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen < luke.steen...@braintreepayments.com> wrote: > Hello, > > We've been testing recent versions of trunk and are seeing surprising > behavior when trying to use the new request timeout functionality. For > example, at revision ae5a5d7: > > # in separate terminals > $ ./bin/zookeeper-server-start.sh config/zookeeper.properties > $ ./bin/kafka-server-start.sh config/server.properties > > # set request timeout > $ cat producer.properties > request.timeout.ms=1000 > > # run the verifiable producer, for example > $ ./bin/kafka-verifiable-producer.sh --broker-list localhost:9092 --topic > testing --throughput 5 --producer.config producer.properties > > If you then kill the kafka server process, you will see the producer hang > indefinitely. This is a very simple case, but the behavior is surprising. > We have also found it easy to reproduce this behavior in more realistic > environments with multiple brokers, custom producers, etc. The end result > is that we're not sure how to safely decommission a broker without > potentially leaving a producer with a permanently stuck request. > > Thanks, > Luke Steensen > >