Hey Luke, I agree the null check seems questionable. I went ahead and created https://issues.apache.org/jira/browse/KAFKA-2805. At least we should have a comment clarifying why the check is correct.
-Jason On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen < luke.steen...@braintreepayments.com> wrote: > After some more investigation, I've been able to get the expected behavior > by removing the null check here: > > https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220 > > Hopefully someone more familiar with the code can comment, but that > statement does appear to be preventing the correct behavior. > > Thanks, > Luke > > > On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen < > luke.steen...@braintreepayments.com> wrote: > > > Hello, > > > > We've been testing recent versions of trunk and are seeing surprising > > behavior when trying to use the new request timeout functionality. For > > example, at revision ae5a5d7: > > > > # in separate terminals > > $ ./bin/zookeeper-server-start.sh config/zookeeper.properties > > $ ./bin/kafka-server-start.sh config/server.properties > > > > # set request timeout > > $ cat producer.properties > > request.timeout.ms=1000 > > > > # run the verifiable producer, for example > > $ ./bin/kafka-verifiable-producer.sh --broker-list localhost:9092 --topic > > testing --throughput 5 --producer.config producer.properties > > > > If you then kill the kafka server process, you will see the producer hang > > indefinitely. This is a very simple case, but the behavior is surprising. > > We have also found it easy to reproduce this behavior in more realistic > > environments with multiple brokers, custom producers, etc. The end result > > is that we're not sure how to safely decommission a broker without > > potentially leaving a producer with a permanently stuck request. > > > > Thanks, > > Luke Steensen > > > > >