Hi,
  We have an Apex Application which has a DAG structure like this:

KafkaConsumer —>  ParquetWriter 

The KafkaConsumer is running at a scale where we have 100 containers for 
consumer consuming from a Kafka-Cluster with an incoming rate of 300K msg/sec 
and each message is about 1KB (Each message is a highly nested Avro message). 
We arrived at the 100 container number for the consumer in order to keep-up 
with the incoming traffic. The ParquetWriter is a bit CPU intensive in our case 
and we thought we may require about 120 - 130Containers.


We have 3 different observations here:
  1. Using 100 KafkaConsumer and 120 ParquetWriter Without Partition Parallel:
                 In this case, Apex automatically introduces a pass-through 
unifier.  In this scenario, we have seen that invariably ParquetWriter 
processes tuples at a lesser rate than KafkaConsumer’s emit rate. That is, if 
Consumer emits Tuples at the rate of 20 million/sec, the ParquetWriter will 
only write at the rate of 17 million/sec. Also, it invariably leads to 
backpressure and makes the consumer consume at a lower rate. I have tried going 
beyond 120 containers as well and
I believe a possible reason could be - Unifier and Writer are in the same 
container and presumably share the same core. And hence they are slower? Is 
this observation correct? I tried tuning by increasing the 
Writer.Inputport.QUEUE_SIZE to 10K. The queue is not even getting half full, 
but still the back-pressure is created for the consumer. Is there any 
additional tune-up that I can do, to:
   A. Make the writer process tuples at almost the same pace as Consumer 
without backpressure on the Consumer

  2. Using Partition Parallel with 120 KafkaConsumer and 120 ParquetWriter:
        In this scenario as well, we have seen that ParquetWriter processes 
tuples at a lesser rate than KafkaConsumer’s emit rate. That is, if Consumer 
emits Tuples at the rate of 20 million/sec, the ParquetWriter will only write 
at the rate of 19 million/sec. This behavior is true even if we keep increasing 
the Consumers and writers to 130 or 140 containers. I believe this behavior is 
because we think the wrier is a bit CPU intensive.

3. Using a different DAG structure like this:

     KafkaConsumer —> ParquetWriter1
              |
              |——————> ParquetWriter2
 
  In this case, both ParquetWriter1 and ParquetWriter2 are same writers. But 
each have a Partition_Parallel relationship with KafkaConsumer. Each is of 100 
containers. So 300 containers in total. The KafkaConsumer sends 1 half of 
messages to ParquetWriter1 and the other half to ParquetWriter2. That is, out 
of 20 million/sec emitted by Consumer, 10Million is handled by each Writer 
group per sec. This setup works fine for our traffic. But, in the place where 
we would have required 20-30 additional Writer contianers, we are using 100 
because partition_parallel seems to avoid all the slowness that we faced in 
approach[1].

We wanted to know if there is a way to optimize approach [1] or [2] and hence 
use lesser containers than go with [3].


Thanks,
Aravindan






Reply via email to