Hi all, I'm running into a weird slowness when using acks=all on Kafka 1.0.1. I reproduced it on a 3-node cluster (each 4 cores/14GB RAM), using a topic with replication factor 2. I used the built-in kafka-producer-perf-test.sh tool with 1KB messages. With all defaults, it can send 100K-200K messages / sec. With acks=all, it timesout quickly even when I only ask 1K-2K messages/sec.
Is that expected? Any idea how to investigate / tune / workaround? Here is a test case: *1.* Create a new topic $ *kafka-topics.sh --create *--zookeeper $ZK_CLUSTER_HOSTS --partitions 6 --replication-factor 2 --topic my-topic Created topic "my-topic" $ *kafka-topics.sh --describe *--zookeeper $ZK_CLUSTER_HOSTS --topic my-topic Topic:my-topic PartitionCount:6 ReplicationFactor:2 Configs: Topic: my-topic Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2,1 Topic: my-topic Partition: 1 Leader: 3 Replicas: 3,2 Isr: 3,2 Topic: my-topic Partition: 2 Leader: 1 Replicas: 1,3 Isr: 1,3 Topic: my-topic Partition: 3 Leader: 2 Replicas: 2,3 Isr: 2,3 Topic: my-topic Partition: 4 Leader: 3 Replicas: 3,1 Isr: 3,1 Topic: my-topic Partition: 5 Leader: 1 Replicas: 1,2 Isr: 1,2 *2.* Producing events in default config supports 100K events per sec (1KB each) without issue: $ *kafka-producer-perf-test.sh *--topic my_topic --num-records 10000000 --record-size 1024 --throughput 100000 --producer-props bootstrap.servers=$KAFKA_CLUSTER_HOSTS 492612 records sent, 98522.4 records/sec (96.21 MB/sec), 120.9 ms avg latency, 735.0 max latency. 493623 records sent, 98724.6 records/sec (96.41 MB/sec), 9.0 ms avg latency, 161.0 max latency. 513640 records sent, 102728.0 records/sec (100.32 MB/sec), 34.4 ms avg latency, 710.0 max latency. 485664 records sent, 97132.8 records/sec (94.86 MB/sec), 132.6 ms avg latency, 1015.0 max latency. 506546 records sent, 101248.5 records/sec (98.88 MB/sec), 69.2 ms avg latency, 786.0 max latency. 508195 records sent, 101639.0 records/sec (99.26 MB/sec), 15.3 ms avg latency, 374.0 max latency. 387037 records sent, 74688.7 records/sec (72.94 MB/sec), 177.6 ms avg latency, 1926.0 max latency. 375598 records sent, 69349.7 records/sec (67.72 MB/sec), 330.9 ms avg latency, 3152.0 max latency. 648168 records sent, 129633.6 records/sec (126.60 MB/sec), 356.7 ms avg latency, 3177.0 max latency. 339564 records sent, 67912.8 records/sec (66.32 MB/sec), 457.5 ms avg latency, 2193.0 max latency. 498226 records sent, 99645.2 records/sec (97.31 MB/sec), 305.6 ms avg latency, 2256.0 max latency. 811259 records sent, 162251.8 records/sec (158.45 MB/sec), 143.7 ms avg latency, 636.0 max latency. 448713 records sent, 69428.0 records/sec (67.80 MB/sec), 139.5 ms avg latency, 2320.0 max latency. 697652 records sent, 139530.4 records/sec (136.26 MB/sec), 203.4 ms avg latency, 2333.0 max latency. 484168 records sent, 96833.6 records/sec (94.56 MB/sec), 57.6 ms avg latency, 601.0 max latency. 427440 records sent, 85488.0 records/sec (83.48 MB/sec), 225.9 ms avg latency, 1337.0 max latency. 513414 records sent, 90565.2 records/sec (88.44 MB/sec), 113.0 ms avg latency, 1574.0 max latency. 641844 records sent, 128368.8 records/sec (125.36 MB/sec), 235.1 ms avg latency, 2930.0 max latency. 479190 records sent, 95818.8 records/sec (93.57 MB/sec), 69.9 ms avg latency, 864.0 max latency. 10000000 records sent, 99712.827058 records/sec (97.38 MB/sec), 165.35 ms avg latency, 3177.00 ms max latency, 4 ms 50th, 782 ms 95th, 2102 ms 99th, 3148 ms 99.9th. *3.* Producing events in default config supports only 2K events per sec with acks="all" quickly breaks apart - it doesn't keep up the pace and fails on timeouts. Sometimes it recovers from it, sometimes not. Here is an example where it recovers a bit: $* kafka-producer-perf-test.sh *--topic my_topic --num-records 10000000 --record-size 1024 --throughput 2000 --producer-props bootstrap.servers=$KAFKA_CLUSTER_HOSTS acks="all" 9997 records sent, 1999.4 records/sec (1.95 MB/sec), 4.2 ms avg latency, 162.0 max latency. 9799 records sent, 1959.4 records/sec (1.91 MB/sec), 5.3 ms avg latency, 79.0 max latency. 6670 records sent, 1334.0 records/sec (1.30 MB/sec), 2.3 ms avg latency, 21.0 max latency. 3624 records sent, 724.7 records/sec (0.71 MB/sec), 2.1 ms avg latency, 25.0 max latency. 3334 records sent, 666.7 records/sec (0.65 MB/sec), 7.4 ms avg latency, 72.0 max latency. 3334 records sent, 666.7 records/sec (0.65 MB/sec), 2.0 ms avg latency, 19.0 max latency. 3334 records sent, 666.8 records/sec (0.65 MB/sec), 2.4 ms avg latency, 43.0 max latency. org.apache.kafka.common.errors.TimeoutException: The request timed out. 7606 records sent, 1521.2 records/sec (1.49 MB/sec), 17253.4 ms avg latency, 30002.0 max latency. 22602 records sent, 4520.4 records/sec (4.41 MB/sec), 7719.0 ms avg latency, 22874.0 max latency. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received. org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-0: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for my_topic-3: 30001 ms has passed since batch creation plus linger time 29714 records sent, 5942.8 records/sec (5.80 MB/sec), 10338.3 ms avg latency, 30006.0 max latency. 10002 records sent, 2000.0 records/sec (1.95 MB/sec), 3.8 ms avg latency, 74.0 max latency. 7848 records sent, 1569.3 records/sec (1.53 MB/sec), 5.1 ms avg latency, 90.0 max latency. 6666 records sent, 1333.2 records/sec (1.30 MB/sec), 2.3 ms avg latency, 21.0 max latency. 6666 records sent, 1333.2 records/sec (1.30 MB/sec), 6.3 ms avg latency, 100.0 max latency. 6670 records sent, 1334.0 records/sec (1.30 MB/sec), 2.1 ms avg latency, 15.0 max latency. 6666 records sent, 1333.2 records/sec (1.30 MB/sec), 2.6 ms avg latency, 67.0 max latency. ^C Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io