Thanks for your reply. Sorry for the late response, I wanted to do some tests
before writing back.
The counting part works similar to your advice. I specify a minimum interval
like 1 minute, in each hour, day etc. it sums all counters of the current
children intervals.
However when I want to
It seems default value for spark.cleaner.delay is 3600 seconds but I need to
be able to count things on daily, weekly or even monthly based.
I suppose the aim of DStream batches and spark.cleaner.delay is to avoid
space issues (running out of memory etc.). I usually use HyperLogLog for
counting