Aggie,

I'm not able to re-produce your behavior in 0.10.0.1.

> I did more testing and find the rule (Topic is created with
"--replication-factor 2 --partitions 1" in following case):
> node 1               node 2
> down(lead)           down (replica)
> down(replica)         up   (lead)              producer send fail !!!

When node 2 is up, after the metadata update producer able to connect and
send messages to it.

Logs:

[2016-09-27T15:18:17,907] NetworkClient: handleDisconnections(): Node 1
disconnected.
[2016-09-27T15:18:18,007] NetworkClient: initiateConnect(): Initiating
connection to node 1 at localhost:9093.
[2016-09-27T15:18:18,008] Selector: pollSelectionKeys(): Connection with
localhost/127.0.0.1 disconnected
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
~[?:1.8.0_45]
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
~[?:1.8.0_45]
    at
org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51)
~[kafka-clients-0.10.0.1.jar:?]
    at
org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73)
~[kafka-clients-0.10.0.1.jar:?]
    at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:309)
[kafka-clients-0.10.0.1.jar:?]
    at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
[kafka-clients-0.10.0.1.jar:?]
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
[kafka-clients-0.10.0.1.jar:?]
    at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
[kafka-clients-0.10.0.1.jar:?]
    at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
[kafka-clients-0.10.0.1.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
[2016-09-27T15:18:18,008] NetworkClient: handleDisconnections(): Node 1
disconnected.
[2016-09-27T15:18:18,043] NetworkClient: maybeUpdate(): Sending metadata
request {topics=[hello]} to node 0
[2016-09-27T15:18:18,052] Metadata: update(): Updated cluster metadata
version 4 to Cluster(nodes = [tcltest1.nmsworks.co.in:9092 (id: 0 rack:
null)], partitions = [Partition(topic = hello, partition = 0, leader =
none, replicas = [0,1,], isr = []])
[2016-09-27T15:18:19,053] NetworkClient: maybeUpdate(): Sending metadata
request {topics=[hello]} to node 0
[2016-09-27T15:18:19,056] Metadata: update(): Updated cluster metadata
version 5 to Cluster(nodes = [tcltest1.nmsworks.co.in:9092 (id: 0 rack:
null)], partitions = [Partition(topic = hello, partition = 0, leader = 0,
replicas = [0,1,], isr = [0,]])
[2016-09-27T15:18:19,081] KafkaProducer: main(): Batch : 4 sent
[2016-09-27T15:18:19,182] KafkaProducer: main(): Batch : 5, Sending the
record with key : 0

- Kamal

On Mon, Sep 26, 2016 at 8:53 AM, FEI Aggie <aggie....@alcatel-lucent.com>
wrote:

> Kamal,
> Thanks for your response. I tried testing with metadata.max.age.ms
> reduced to 10s, but the behavior not changed, and producer still can't find
> the live broker.
>
> I did more testing and find the rule (Topic is created with
> "--replication-factor 2 --partitions 1" in following case):
> node 1               node 2
> down(lead)           down (replica)
> down(replica)         up   (lead)              producer send fail !!!
>
>

> down(lead)           down (replica)
> up  (lead)           down (replica)             producer send ok !!!
>
> If the only node with original lead partition up, everything is fine.
> If the only node with original replica partition up, producer can't
> connect to broker alive (always try to connect to the original lead broker,
> node 1 in my case).
>
> Kafka can't recover for this situation? Anyone has clue for this?
>
> Thanks!
> Aggie
> -----Original Message-----
> From: Kamal C [mailto:kamaltar...@gmail.com]
> Sent: Saturday, September 24, 2016 1:37 PM
> To: users@kafka.apache.org
> Subject: Re: producer can't push msg sometimes with 1 broker recoved
>
> Reduce the metadata refresh interval 'metadata.max.age.ms' from 5 min to
> your desired time interval.
> This may reduce the time window of non-availability broker.
>
> -- Kamal
>

Reply via email to