Hello,

I am using Beam java Sdk 2.19 on dataflow. We have a system where log
shipper continuously emit logs to kafka and beam read logs using KafkaIO.

Sometime I am seeing slowness on kafkaIO read with one of the topics
(probably during peak traffic period), where there is a 2-3 minutes between
record timestamp and time when the beam reads the log. For instance:

2020-08-05 12:46:23.826 PDT

 {"@timestamp":1596656684.274594,"time":"2020-08-05T19:44:44.274594282Z”,
“data” : data}, offset: 2148857. timestamp: 1596656685005



If you convert record timestamp (1596656685005) which is in epoch ms to
PDT, you will see approx 2 mins difference between  this and 2020-08-05
12:46:23.826 PDT (time when beam actually reads the data).

So One way of achieving horizontal scaling here is by increasing the number
of partitions on kafka broker. What can be done on the beam side i.e.
kafkaIO side to tackle this slowness ? Any suggestions?

Thanks and regards
Mohil

Reply via email to