Arvindan, When you had the MxN case with 100 kafka consumers sending to 120 parquet writers what was the cpu utilization of the parquet containers. Was it close to 100% or did you have spare cycles? I am trying to determine if it is an IO bottleneck or processing.
Thanks On Thu, Dec 22, 2016 at 10:50 AM, Arvindan Thulasinathan < [email protected]> wrote: > Hi, > We have an Apex Application which has a DAG structure like this: > > KafkaConsumer —> ParquetWriter > > The KafkaConsumer is running at a scale where we have 100 containers for > consumer consuming from a Kafka-Cluster with an incoming rate of 300K > msg/sec and each message is about 1KB (Each message is a highly nested Avro > message). We arrived at the 100 container number for the consumer in order > to keep-up with the incoming traffic. The ParquetWriter is a bit CPU > intensive in our case and we thought we may require about 120 - > 130Containers. > > > We have 3 different observations here: > 1. Using 100 KafkaConsumer and 120 ParquetWriter Without Partition > Parallel: > In this case, Apex automatically introduces a > pass-through unifier. In this scenario, we have seen that invariably > ParquetWriter processes tuples at a lesser rate than KafkaConsumer’s emit > rate. That is, if Consumer emits Tuples at the rate of 20 million/sec, the > ParquetWriter will only write at the rate of 17 million/sec. Also, it > invariably leads to backpressure and makes the consumer consume at a lower > rate. I have tried going beyond 120 containers as well and > I believe a possible reason could be - Unifier and Writer are in the same > container and presumably share the same core. And hence they are slower? Is > this observation correct? I tried tuning by increasing the > Writer.Inputport.QUEUE_SIZE to 10K. The queue is not even getting half > full, but still the back-pressure is created for the consumer. Is there any > additional tune-up that I can do, to: > A. Make the writer process tuples at almost the same pace as Consumer > without backpressure on the Consumer > > 2. Using Partition Parallel with 120 KafkaConsumer and 120 ParquetWriter: > In this scenario as well, we have seen that ParquetWriter > processes tuples at a lesser rate than KafkaConsumer’s emit rate. That is, > if Consumer emits Tuples at the rate of 20 million/sec, the ParquetWriter > will only write at the rate of 19 million/sec. This behavior is true even > if we keep increasing the Consumers and writers to 130 or 140 containers. I > believe this behavior is because we think the wrier is a bit CPU intensive. > > 3. Using a different DAG structure like this: > > KafkaConsumer —> ParquetWriter1 > | > |——————> ParquetWriter2 > > In this case, both ParquetWriter1 and ParquetWriter2 are same writers. > But each have a Partition_Parallel relationship with KafkaConsumer. Each is > of 100 containers. So 300 containers in total. The KafkaConsumer sends 1 > half of messages to ParquetWriter1 and the other half to ParquetWriter2. > That is, out of 20 million/sec emitted by Consumer, 10Million is handled by > each Writer group per sec. This setup works fine for our traffic. But, in > the place where we would have required 20-30 additional Writer contianers, > we are using 100 because partition_parallel seems to avoid all the slowness > that we faced in approach[1]. > > We wanted to know if there is a way to optimize approach [1] or [2] and > hence use lesser containers than go with [3]. > > > Thanks, > Aravindan > > > > > > >
