I've been running into an issue with the 0.8.2.1 new producer for a few weeks now and I haven't been able to figure it out. Hopefully someone on the list can help!
First off my producer config looks like this: props.put(ProducerConfig.ACKS_CONFIG, "1") props.put(ProducerConfig.RETRIES_CONFIG, "10") props.put(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG, "true") props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer") props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer") props.put(ProducerConfig.TIMEOUT_CONFIG, "5000") props.put(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG, "5000") During network hiccups between my senders and the brokers I start seeing these log messages as expected: 2015-08-20 20:30:12,231 [kafka-producer-network-thread | producer-1] WARN org.apache.kafka.common.network.Selector - Error in I/O with <host>/<ip-address> java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.$$YJP$$read0(Native Method) followed by: Got error produce response with correlation id 17717 on topic-partition event.beacon-38, retrying (8 attempts left). Error: NETWORK_EXCEPTION The problem is that even when network connectivity is restored the whole app hangs. Gathering a heap dump and looking through the RecordAccumulator I can see that the buffer is full and my producers are blocked indefinitely. Any ideas?