Hi Arti,

>From your description, it sounds like your per-partition throughput is
extremely high (380 MB/s across three partitions is ~125 MB/s per
partition). A more typical high-end range would be 5-10 MB/s per partition.
Each follower gets mapped to a single replica fetcher thread on the
follower side, which maintains a single TCP connection to the leader, and
that connection is in turn mapped to a single network processor thread on
the leader side. With the setup you described, you have a total of 3
[partitions] * (2 - 1) [replicas per partition, minus one for the leader]
total followers = 3 across the whole cluster, but 10 network threads per
broker. In this scenario, you're likely not actually spreading the
replication traffic across those multiple network processor threads. This
means that you're probably pinning a single core per broker doing TLS for
replication, while the others are mostly idle (or servicing client
requests).

Bumping your partition count will likely allow you to spread replication
load across more of your network processor threads, and achieve higher CPU
utilization.

In addition, it's important to consider not only disk usage but disk write
throughput. With 380 MB/s of ingress traffic and an RF of 2, you'll need a
total of 380 * 2 = 760 MB/s of disk write throughput, or about 250 MB/s per
broker. Coincidentally (or not), MSK brokers use gp2 EBS volumes
<https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html>
for storing the Kafka log directory, which means you're limited to an
absolute max of 250 MiB/s of write throughput per broker. You probably want
to allow yourself some headroom here too, so I'd suggest that you start
from your desired ingress throughput and replication factor, plus your
desired safety margin, and then work backwards to figure out how many gp2
volumes you'll need to achieve that level of write throughput safely. Also
note that while CPU and RAM scale with the larger MSK instance types, disk
throughput does not.

Cheers,
Ben

On Wed, Sep 9, 2020 at 7:43 AM Arti Pande <pande.a...@gmail.com> wrote:

> Hi,
>
> We have been evaluating Managed Streaming for Kafka (MSK) on AWS for a
> use-case that requires high-speed data ingestion of the order of millions
> of messages (each ~1 KB size) per second. We ran into some issues when
> testing this case.
>
> Context:
> To start with, we have set up single topic with 3 partitions on a 3 node
> MSK of m5.large (2 cores, 8 GB RAM, 500 GB EBS) with encryption enabled for
> inter-broker (intra-MSK) communication. Each broker is in a separate AZ
> (total 3 AZs and 3 brokers) and has 10 network threads and 16 IO threads.
>
> When the topic has replication-factor = 2  and min.insync.replicas = 2 and
> publisher uses acks = all, when sending 100+ million messages using 3
> parallel publishers intermittently results in following error.
>           `Delivery failed: Broker: Not enough in-sync replicas`
> As per documentation this error is thrown when ins-sync replicas are
> lagging behind for more than a configured duration (
> replica.lag.time.max.ms=30 seconds as default).
>
> However when we don't see this error, the throughput is around 90 K
> msgs/sec i.e. 90 MB/sec. CPU usage is below 50% disk usage is also < 20%.
> So apparently CPU/Memory/Disk are not an issue ??
>
> If we change replication-factor =1 and min.insync.replicas = 1 and/or
> ack=1 and keep all other things same, then there are no errors and
> throughput is ~380 K msgs.sec i.e. 380 MB/sec. CPU usage was below < 30 %
>
> Question:
> Without replication we were able to get 380 MB/sec written, so assuming
> disk or CPU or memory are not an issue. what could be the reason for
> replicas to lag behind at 90 MB/sec throughput? Is it the number of total
> threads (10 n/w + 16 IO) being too high for a 2 core machine? But then same
> thread setting works good without replication. What could be the reason for
> (1) lesser throughput when turning replication on and (2) replicas lagging
> behind when replication is turned on?
>
> Thanks
> Arti
>

Reply via email to