Hi Arti, >From your description, it sounds like your per-partition throughput is extremely high (380 MB/s across three partitions is ~125 MB/s per partition). A more typical high-end range would be 5-10 MB/s per partition. Each follower gets mapped to a single replica fetcher thread on the follower side, which maintains a single TCP connection to the leader, and that connection is in turn mapped to a single network processor thread on the leader side. With the setup you described, you have a total of 3 [partitions] * (2 - 1) [replicas per partition, minus one for the leader] total followers = 3 across the whole cluster, but 10 network threads per broker. In this scenario, you're likely not actually spreading the replication traffic across those multiple network processor threads. This means that you're probably pinning a single core per broker doing TLS for replication, while the others are mostly idle (or servicing client requests).
Bumping your partition count will likely allow you to spread replication load across more of your network processor threads, and achieve higher CPU utilization. In addition, it's important to consider not only disk usage but disk write throughput. With 380 MB/s of ingress traffic and an RF of 2, you'll need a total of 380 * 2 = 760 MB/s of disk write throughput, or about 250 MB/s per broker. Coincidentally (or not), MSK brokers use gp2 EBS volumes <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html> for storing the Kafka log directory, which means you're limited to an absolute max of 250 MiB/s of write throughput per broker. You probably want to allow yourself some headroom here too, so I'd suggest that you start from your desired ingress throughput and replication factor, plus your desired safety margin, and then work backwards to figure out how many gp2 volumes you'll need to achieve that level of write throughput safely. Also note that while CPU and RAM scale with the larger MSK instance types, disk throughput does not. Cheers, Ben On Wed, Sep 9, 2020 at 7:43 AM Arti Pande <pande.a...@gmail.com> wrote: > Hi, > > We have been evaluating Managed Streaming for Kafka (MSK) on AWS for a > use-case that requires high-speed data ingestion of the order of millions > of messages (each ~1 KB size) per second. We ran into some issues when > testing this case. > > Context: > To start with, we have set up single topic with 3 partitions on a 3 node > MSK of m5.large (2 cores, 8 GB RAM, 500 GB EBS) with encryption enabled for > inter-broker (intra-MSK) communication. Each broker is in a separate AZ > (total 3 AZs and 3 brokers) and has 10 network threads and 16 IO threads. > > When the topic has replication-factor = 2 and min.insync.replicas = 2 and > publisher uses acks = all, when sending 100+ million messages using 3 > parallel publishers intermittently results in following error. > `Delivery failed: Broker: Not enough in-sync replicas` > As per documentation this error is thrown when ins-sync replicas are > lagging behind for more than a configured duration ( > replica.lag.time.max.ms=30 seconds as default). > > However when we don't see this error, the throughput is around 90 K > msgs/sec i.e. 90 MB/sec. CPU usage is below 50% disk usage is also < 20%. > So apparently CPU/Memory/Disk are not an issue ?? > > If we change replication-factor =1 and min.insync.replicas = 1 and/or > ack=1 and keep all other things same, then there are no errors and > throughput is ~380 K msgs.sec i.e. 380 MB/sec. CPU usage was below < 30 % > > Question: > Without replication we were able to get 380 MB/sec written, so assuming > disk or CPU or memory are not an issue. what could be the reason for > replicas to lag behind at 90 MB/sec throughput? Is it the number of total > threads (10 n/w + 16 IO) being too high for a 2 core machine? But then same > thread setting works good without replication. What could be the reason for > (1) lesser throughput when turning replication on and (2) replicas lagging > behind when replication is turned on? > > Thanks > Arti >