Hi all,

Lately I’ve been investigating onto the performance characteristics of Flink 
part of our internal benchmark. Part of this we’ve developed and deployed an 
application that pools data from Kafka, groups the data by a key during a fixed 
time window of a minute. 

In total, the topic that the KafkaConsumer pooled from consists of 100 million 
messages each of 100 bytes size. What we were expecting is that no records will 
be neither read nor produced back to Kafka for the first minute of the window 
operation - however, this is unfortunately not the case. Below you may find a 
plot showing the number of records produced per second. 

Could anyone provide an explanation onto the behaviour shown in the graph 
below? What are the reasons behind consuming/producing messages from/to Kafka 
while the window has not expired yet? 

 

Attachment: Flink_6000ms_Window_Throughput (1).pdf
Description: Adobe PDF document

Reply via email to