I am working with spark structured streaming (2.2.1) reading data from Kafka
I need to aggregate data ingested every minute and I am using spark-shell at
the moment. The message rate ingestion rate is approx 500k/second. During
some trigger intervals (1 minute) especially when the streaming process is
started, all tasks finish in 20seconds but during some triggers, it takes 90
I have tried to reduce the number of partitions approx (100 from 300) to reduce
the consumers for Kafka, but that has not helped. I also tried the
kafkaConsumer.pollTimeoutMs to 30 seconds but then I see a lot of
java.util.concurrent.TimeoutException: Cannot fetch record for offset.
So I wanted to see if anyone has any thoughts/recommendations.