Hi Ashwin,

Thanks for reaching out to the Flink community. Since you have tested that
a kafka_source -> discarding_sink can process 10 Million records/s you
might also wanna test the write throughput to data_sink and dlq_sink. Maybe
these sinks are limiting your overall throughput by backpressuring the data
flow. If this is not the problem, then I believe that some profiling could
help pinpointing the bottleneck.

Cheers,
Till

On Sun, Nov 8, 2020 at 10:26 PM ashwin konale <ashwin.kon...@gmail.com>
wrote:

> Hey guys,
> I am struggling to improve the throughput of my simple flink application.
> The target topology is this.
>
> read_from_kafka(byte array deserializer) --rescale-->
> processFunction(confluent avro deserialization) -> split -> 1.
> data_sink,2.dlq_sink
>
> Kafka traffic is pretty high
> Partitions: 128
> Traffic:  ~500k msg/s, 50Mbps.
>
> Flink is running on k8s writing to hdfs3. I have ~200CPU and 400G memory
> at hand. I have tried few configurations but I am not able to get the
> throughput more than 1mil per second. (Which I need for recovering from
> failures). I have tried increasing parallelism a lot (until 512), But it
> has very little impact on the throughput. Primary metric I am considering
> for throughput is kafka-source, numRecordsOut and message backlog. I have
> already increased default kafka consumer defaults like max.poll.records
> etc. Here are the few things I tried already.
> Try0: Check raw kafka consumer throughput (kafka_source -> discarding_sink)
> tm: 20, slots:4, parallelism 80
> throughput: 10Mil/s
>
> Try1: Disable chaining to introduce network related lag.
> tm: 20, slots:4, parallelism 80
> throughput: 1Mil/s
> Also tried with increasing floating-buffers to 100, and
> buffers-per-channel to 64. Increasing parallelism seems to have no effect.
> Observation: out/in buffers are always at 100% utilization.
>
> After this I have tried various different things with different network
> configs, parallelism,jvm sizes etc. But throughput seems to be stuck at
> 1Mil. Can someone please help me to figure out what key metrics to look for
> and how can I improve the situation. Happy to provide any details needed.
>
> Flink version: 1.11.2
>

Reply via email to