Hi,
We have an Apex Application which has a DAG structure like this:
KafkaConsumer —> ParquetWriter
The KafkaConsumer is running at a scale where we have 100 containers for
consumer consuming from a Kafka-Cluster with an incoming rate of 300K msg/sec
and each message is about 1KB (Each message is a highly nested Avro message).
We arrived at the 100 container number for the consumer in order to keep-up
with the incoming traffic. The ParquetWriter is a bit CPU intensive in our case
and we thought we may require about 120 - 130Containers.
We have 3 different observations here:
1. Using 100 KafkaConsumer and 120 ParquetWriter Without Partition Parallel:
In this case, Apex automatically introduces a pass-through
unifier. In this scenario, we have seen that invariably ParquetWriter
processes tuples at a lesser rate than KafkaConsumer’s emit rate. That is, if
Consumer emits Tuples at the rate of 20 million/sec, the ParquetWriter will
only write at the rate of 17 million/sec. Also, it invariably leads to
backpressure and makes the consumer consume at a lower rate. I have tried going
beyond 120 containers as well and
I believe a possible reason could be - Unifier and Writer are in the same
container and presumably share the same core. And hence they are slower? Is
this observation correct? I tried tuning by increasing the
Writer.Inputport.QUEUE_SIZE to 10K. The queue is not even getting half full,
but still the back-pressure is created for the consumer. Is there any
additional tune-up that I can do, to:
A. Make the writer process tuples at almost the same pace as Consumer
without backpressure on the Consumer
2. Using Partition Parallel with 120 KafkaConsumer and 120 ParquetWriter:
In this scenario as well, we have seen that ParquetWriter processes
tuples at a lesser rate than KafkaConsumer’s emit rate. That is, if Consumer
emits Tuples at the rate of 20 million/sec, the ParquetWriter will only write
at the rate of 19 million/sec. This behavior is true even if we keep increasing
the Consumers and writers to 130 or 140 containers. I believe this behavior is
because we think the wrier is a bit CPU intensive.
3. Using a different DAG structure like this:
KafkaConsumer —> ParquetWriter1
|
|——————> ParquetWriter2
In this case, both ParquetWriter1 and ParquetWriter2 are same writers. But
each have a Partition_Parallel relationship with KafkaConsumer. Each is of 100
containers. So 300 containers in total. The KafkaConsumer sends 1 half of
messages to ParquetWriter1 and the other half to ParquetWriter2. That is, out
of 20 million/sec emitted by Consumer, 10Million is handled by each Writer
group per sec. This setup works fine for our traffic. But, in the place where
we would have required 20-30 additional Writer contianers, we are using 100
because partition_parallel seems to avoid all the slowness that we faced in
approach[1].
We wanted to know if there is a way to optimize approach [1] or [2] and hence
use lesser containers than go with [3].
Thanks,
Aravindan