Hi Ben, Thanks for quick reply. As you mentioned, we are seeing the write throughput per broker being capped at 250 MB/s.
We tried multiple scenarios as below 1) Msg size 1 KB, 3 publishers, 3 brokers, with 3 parritions and RF = 2 In this, as mentioned in earlier emai, with replication on Total throughput was never more than 90 MB/s per cluster. So each broker was writing 60 MB/s (30 MB/s as leader + 30 MB as follower replica). Thats just 1/4 th (25%) of the total EBS throughput limit of 250 MB/s. However without replication we are able to get 126 MB/s per broker and total throughput of 380 MB/s. Question was why does the total throughput drop so much (380 to 90 MB/s) when the replication is on. Why does replication slow down the broker to 25% of its speed? 2) Msg size 1 KB, 6 publishers, 6 brokers with 6 partitions and RF =1 In this case throughput without replication (RF=1) was 1487 MB/s total and 250 MB/s per broker. At this point, EBS must be reaching its upper limit. Is there a way for us to choose "Povisioned (high) IOPS"? As I am suspecting in this case, the disk is a bottleneck. However how do we know that the network is not a bottleneck? Because the EC2 instance type used for publisher (m5.2xlarge 8 core) and broker (m5.large 2 core) show the N/W bandwidth as 10 Gbps = 1.25 GBps. Is it possible to get EBS volumes with higher IO throughput? Thanks, Arti Pande On 2020/09/09 15:26:22, Ben Weintraub <benwe...@gmail.com> wrote: > Hi Arti, > > From your description, it sounds like your per-partition throughput is > extremely high (380 MB/s across three partitions is ~125 MB/s per > partition). A more typical high-end range would be 5-10 MB/s per partition. > Each follower gets mapped to a single replica fetcher thread on the > follower side, which maintains a single TCP connection to the leader, and > that connection is in turn mapped to a single network processor thread on > the leader side. With the setup you described, you have a total of 3 > [partitions] * (2 - 1) [replicas per partition, minus one for the leader] > total followers = 3 across the whole cluster, but 10 network threads per > broker. In this scenario, you're likely not actually spreading the > replication traffic across those multiple network processor threads. This > means that you're probably pinning a single core per broker doing TLS for > replication, while the others are mostly idle (or servicing client > requests). > > Bumping your partition count will likely allow you to spread replication > load across more of your network processor threads, and achieve higher CPU > utilization. > > In addition, it's important to consider not only disk usage but disk write > throughput. With 380 MB/s of ingress traffic and an RF of 2, you'll need a > total of 380 * 2 = 760 MB/s of disk write throughput, or about 250 MB/s per > broker. Coincidentally (or not), MSK brokers use gp2 EBS volumes > <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html> > for storing the Kafka log directory, which means you're limited to an > absolute max of 250 MiB/s of write throughput per broker. You probably want > to allow yourself some headroom here too, so I'd suggest that you start > from your desired ingress throughput and replication factor, plus your > desired safety margin, and then work backwards to figure out how many gp2 > volumes you'll need to achieve that level of write throughput safely. Also > note that while CPU and RAM scale with the larger MSK instance types, disk > throughput does not. > > Cheers, > Ben > > On Wed, Sep 9, 2020 at 7:43 AM Arti Pande <pande.a...@gmail.com> wrote: > > > Hi, > > > > We have been evaluating Managed Streaming for Kafka (MSK) on AWS for a > > use-case that requires high-speed data ingestion of the order of millions > > of messages (each ~1 KB size) per second. We ran into some issues when > > testing this case. > > > > Context: > > To start with, we have set up single topic with 3 partitions on a 3 node > > MSK of m5.large (2 cores, 8 GB RAM, 500 GB EBS) with encryption enabled for > > inter-broker (intra-MSK) communication. Each broker is in a separate AZ > > (total 3 AZs and 3 brokers) and has 10 network threads and 16 IO threads. > > > > When the topic has replication-factor = 2 and min.insync.replicas = 2 and > > publisher uses acks = all, when sending 100+ million messages using 3 > > parallel publishers intermittently results in following error. > > `Delivery failed: Broker: Not enough in-sync replicas` > > As per documentation this error is thrown when ins-sync replicas are > > lagging behind for more than a configured duration ( > > replica.lag.time.max.ms=30 seconds as default). > > > > However when we don't see this error, the throughput is around 90 K > > msgs/sec i.e. 90 MB/sec. CPU usage is below 50% disk usage is also < 20%. > > So apparently CPU/Memory/Disk are not an issue ?? > > > > If we change replication-factor =1 and min.insync.replicas = 1 and/or > > ack=1 and keep all other things same, then there are no errors and > > throughput is ~380 K msgs.sec i.e. 380 MB/sec. CPU usage was below < 30 % > > > > Question: > > Without replication we were able to get 380 MB/sec written, so assuming > > disk or CPU or memory are not an issue. what could be the reason for > > replicas to lag behind at 90 MB/sec throughput? Is it the number of total > > threads (10 n/w + 16 IO) being too high for a 2 core machine? But then same > > thread setting works good without replication. What could be the reason for > > (1) lesser throughput when turning replication on and (2) replicas lagging > > behind when replication is turned on? > > > > Thanks > > Arti > > >