David Hay created KAFKA-2135:
--------------------------------

             Summary: New Kafka Producer Client: Send requests wait 
indefinitely if no broker is available.
                 Key: KAFKA-2135
                 URL: https://issues.apache.org/jira/browse/KAFKA-2135
             Project: Kafka
          Issue Type: Bug
          Components: producer 
    Affects Versions: 0.8.2.0
            Reporter: David Hay
            Assignee: Jun Rao
            Priority: Critical


I'm seeing issues when sending a message with the new producer client API.  The 
future returned from Producer.send() will block indefinitely if the cluster is 
unreachable for some reason.  Here are the steps:

# Start up a single node kafka cluster locally.
# Start up application and create a KafkaProducer with the following config:
{noformat}
KafkaProducerWrapper values: 
        compression.type = snappy
        metric.reporters = []
        metadata.max.age.ms = 300000
        metadata.fetch.timeout.ms = 60000
        acks = all
        batch.size = 16384
        reconnect.backoff.ms = 10
        bootstrap.servers = [localhost:9092]
        receive.buffer.bytes = 32768
        retry.backoff.ms = 100
        buffer.memory = 33554432
        timeout.ms = 30000
        key.serializer = class com.mycompany.kafka.serializer.ToStringEncoder
        retries = 3
        max.request.size = 1048576
        block.on.buffer.full = true
        value.serializer = class com.mycompany.kafka.serializer.JsonEncoder
        metrics.sample.window.ms = 30000
        send.buffer.bytes = 131072
        max.in.flight.requests.per.connection = 5
        metrics.num.samples = 2
        linger.ms = 0
        client.id = site-json
{noformat}
# Send some messages...they are successfully sent
# Shut down the kafka broker
# Send another message.

At this point, calling {{get()}} on the returned Future will block indefinitely 
until the broker is restarted.

It appears that there is some logic in 
{{org.apache.kafka.clients.producer.internal.Sender}} that is supposed to mark 
the Future as "done" in response to a disconnect event (towards the end of the 
run(long) method).  However, the while loop earlier in this method seems to 
remove the broker from consideration entirely, so the final loop over 
ClientResponse objects is never executed.

It seems like "timeout.ms" configuration should be honored in this case, or 
perhaps introduce another timeout, indicating that we should give up waiting 
for the cluster to return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to