David Hay created KAFKA-2135: -------------------------------- Summary: New Kafka Producer Client: Send requests wait indefinitely if no broker is available. Key: KAFKA-2135 URL: https://issues.apache.org/jira/browse/KAFKA-2135 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2.0 Reporter: David Hay Assignee: Jun Rao Priority: Critical
I'm seeing issues when sending a message with the new producer client API. The future returned from Producer.send() will block indefinitely if the cluster is unreachable for some reason. Here are the steps: # Start up a single node kafka cluster locally. # Start up application and create a KafkaProducer with the following config: {noformat} KafkaProducerWrapper values: compression.type = snappy metric.reporters = [] metadata.max.age.ms = 300000 metadata.fetch.timeout.ms = 60000 acks = all batch.size = 16384 reconnect.backoff.ms = 10 bootstrap.servers = [localhost:9092] receive.buffer.bytes = 32768 retry.backoff.ms = 100 buffer.memory = 33554432 timeout.ms = 30000 key.serializer = class com.mycompany.kafka.serializer.ToStringEncoder retries = 3 max.request.size = 1048576 block.on.buffer.full = true value.serializer = class com.mycompany.kafka.serializer.JsonEncoder metrics.sample.window.ms = 30000 send.buffer.bytes = 131072 max.in.flight.requests.per.connection = 5 metrics.num.samples = 2 linger.ms = 0 client.id = site-json {noformat} # Send some messages...they are successfully sent # Shut down the kafka broker # Send another message. At this point, calling {{get()}} on the returned Future will block indefinitely until the broker is restarted. It appears that there is some logic in {{org.apache.kafka.clients.producer.internal.Sender}} that is supposed to mark the Future as "done" in response to a disconnect event (towards the end of the run(long) method). However, the while loop earlier in this method seems to remove the broker from consideration entirely, so the final loop over ClientResponse objects is never executed. It seems like "timeout.ms" configuration should be honored in this case, or perhaps introduce another timeout, indicating that we should give up waiting for the cluster to return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)