Flink CEP Resource Utilisation Optimisation

Abhishek Singla Sun, 26 Mar 2023 08:58:33 -0700

Hi Team,

*Flink Version:* 1.15.0
*Java Version:* 1.8
*Standalone Cluster*
*Task Manager:* AWS EC2 of Instance Type c5n.4xlarge (vCPU 16, Memory 42
Gb, 8 slots per TM)
*CEP Scenario:* Kafka Event A followed by Kafka Event B within 10 mins
*Throughput:* 20k events per second for Event A, 0 for Kafka Event B
*State Backend:* FsStateBackend
*Unaligned Checkpoints:* Enabled
*asynchronousSnapshots:* true


While testing this (Kafka Event A followed by Kafka Event B within 10 mins)
scenario on load environment, it took 20 nodes of TM to achieve this
throughput otherwise either CPU utilization would reach its peak or
backpressure would be observed because output buffers are full. The
checkpoint size is only 6.75 GB, the state stored within the CEP operator
would be much lesser as we do unaligned checkpointing.

I am looking for some input on if it takes this many resources to
archive this throughput, and if not what probably could be the issue here.

There was one more issue that I found If the throughput of Event A goes to
zero, then also the checkpoint size stays around 2 GB even after hours. Is
this expected?

Regards,
Abhishek Singla

Flink CEP Resource Utilisation Optimisation

Reply via email to