Hello,

We have a Spark streaming application and the problem that we are
encountering is that the batch processing time keeps on increasing and
eventually causes the application to start lagging. I am hoping that
someone here can point me to any underlying cause of why this might happen.

The batch interval is 1 minute as of now and the app does some maps,
filters, joins and reduceByKeyAndWindow operations. All the reduces are
invertible functions and so we do provide the inverse-reduce functions in
all those. The largest window size we have is 1 hour right now. When the
app is started, we see that the batch processing time is between 20 and 30
seconds. It keeps creeping up slowly and by the time it hits the 1 hour
mark, it somewhere around 35-40 seconds. Somewhat expected and still not
bad!

I would expect that since the largest window we have is 1 hour long, the
application should stabilize around the 1 hour mark and start processing
subsequent batches within that 35-40 second zone. However, that is not what
is happening. The processing time still keeps increasing and eventually in
a few hours it exceeds 1 minute mark and then starts lagging. Eventually
the lag builds up and becomes in minutes at which point we have to restart
the system.

Any pointers on why this could be happening and what we can do to
troubleshoot further?

Thanks
Nikunj

Reply via email to