Re: Any advice for using big spark.cleaner.delay value in Spark Streaming?

2014-04-30 Thread buremba
Thanks for your reply. Sorry for the late response, I wanted to do some tests before writing back. The counting part works similar to your advice. I specify a minimum interval like 1 minute, in each hour, day etc. it sums all counters of the current children intervals. However when I want to

Any advice for using big spark.cleaner.delay value in Spark Streaming?

2014-04-27 Thread buremba
It seems default value for spark.cleaner.delay is 3600 seconds but I need to be able to count things on daily, weekly or even monthly based. I suppose the aim of DStream batches and spark.cleaner.delay is to avoid space issues (running out of memory etc.). I usually use HyperLogLog for counting