On Thursday 08 January 2015 01:51 AM, Sa Li wrote:
see this type of error again, back to normal in few secs

[2015-01-07 20:19:49,744] WARN Error in I/O with harmful-jar.master/
10.100.98.102

That's a really weird hostname, the "harmful-jar.master". Is that really your hostname? You mention that this happens during performance testing. Have you taken a note of how many connection are open to that 10.100.98.102 IP when this "Connection refused" exception happens?

-Jaikiran


  (org.apache.kafka.common.network.Selector)
java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
         at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
         at org.apache.kafka.common.network.Selector.poll(Selector.java:232)
         at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191)
         at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
         at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
         at java.lang.Thread.run(Thread.java:745)
[2015-01-07 20:19:49,754] WARN Error in I/O with harmful-jar.master/
10.100.98.102 (org.apache.kafka.common.network.Selector)
java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
         at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
         at org.apache.kafka.common.network.Selector.poll(Selector.java:232)
         at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191)
         at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
         at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
         at java.lang.Thread.run(Thread.java:745)
[2015-01-07 20:19:49,764] WARN Error in I/O with harmful-jar.master/
10.100.98.102 (org.apache.kafka.common.network.Selector)
java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
         at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
         at org.apache.kafka.common.network.Selector.poll(Selector.java:232)
         at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191)
         at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
         at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
         at java.lang.Thread.run(Thread.java:745)
160403 records sent, 32080.6 records/sec (91.78 MB/sec), 507.0 ms avg
latency, 2418.0 max latency.
109882 records sent, 21976.4 records/sec (62.87 MB/sec), 672.7 ms avg
latency, 3529.0 max latency.
100315 records sent, 19995.0 records/sec (57.21 MB/sec), 774.8 ms avg
latency, 3858.0 max latency.

On Wed, Jan 7, 2015 at 12:07 PM, Sa Li <sal...@gmail.com> wrote:

Hi, All

I am doing performance test by

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test-rep-three 500000000 100 -1 acks=1 bootstrap.servers=
10.100.98.100:9092,10.100.98.101:9092,10.100.98.102:9092
buffer.memory=67108864 batch.size=8196

where the topic test-rep-three is described as follow:

bin/kafka-topics.sh --describe --zookeeper 10.100.98.101:2181 --topic
test-rep-three
Topic:test-rep-three    PartitionCount:8        ReplicationFactor:3
Configs:
         Topic: test-rep-three   Partition: 0    Leader: 100     Replicas:
100,102,101   Isr: 102,101,100
         Topic: test-rep-three   Partition: 1    Leader: 101     Replicas:
101,100,102   Isr: 102,101,100
         Topic: test-rep-three   Partition: 2    Leader: 102     Replicas:
102,101,100   Isr: 101,102,100
         Topic: test-rep-three   Partition: 3    Leader: 100     Replicas:
100,101,102   Isr: 101,100,102
         Topic: test-rep-three   Partition: 4    Leader: 101     Replicas:
101,102,100   Isr: 102,100,101
         Topic: test-rep-three   Partition: 5    Leader: 102     Replicas:
102,100,101   Isr: 100,102,101
         Topic: test-rep-three   Partition: 6    Leader: 102     Replicas:
100,102,101   Isr: 102,101,100
         Topic: test-rep-three   Partition: 7    Leader: 101     Replicas:
101,100,102   Isr: 101,100,102

Apparently, it produces the messages and run for a while, but it
periodically have such exceptions:

org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.
141292 records sent, 28258.4 records/sec (80.85 MB/sec), 551.2 ms avg
latency, 1494.0 max latency.
142526 records sent, 28505.2 records/sec (81.55 MB/sec), 580.8 ms avg
latency, 1513.0 max latency.
146564 records sent, 29312.8 records/sec (83.86 MB/sec), 557.9 ms avg
latency, 1431.0 max latency.
146755 records sent, 29351.0 records/sec (83.97 MB/sec), 556.7 ms avg
latency, 1480.0 max latency.
147963 records sent, 29592.6 records/sec (84.67 MB/sec), 556.7 ms avg
latency, 1546.0 max latency.
146931 records sent, 29386.2 records/sec (84.07 MB/sec), 550.9 ms avg
latency, 1715.0 max latency.
146947 records sent, 29389.4 records/sec (84.08 MB/sec), 555.1 ms avg
latency, 1750.0 max latency.
146422 records sent, 29284.4 records/sec (83.78 MB/sec), 557.9 ms avg
latency, 1818.0 max latency.
147516 records sent, 29503.2 records/sec (84.41 MB/sec), 555.6 ms avg
latency, 1806.0 max latency.
147877 records sent, 29575.4 records/sec (84.62 MB/sec), 552.1 ms avg
latency, 1821.0 max latency.
147201 records sent, 29440.2 records/sec (84.23 MB/sec), 554.5 ms avg
latency, 1826.0 max latency.
148317 records sent, 29663.4 records/sec (84.87 MB/sec), 558.1 ms avg
latency, 1792.0 max latency.
147756 records sent, 29551.2 records/sec (84.55 MB/sec), 550.9 ms avg
latency, 1806.0 max latency

then back into correct process state, is that because rebalance?

thanks



--

Alec Li




Reply via email to