If I understand you correctly, you need Window duration of 1 hour and sliding interval of 5 seconds.
Mohammed -----Original Message----- From: Ankur Chauhan [mailto:achau...@brightcove.com] Sent: Friday, May 8, 2015 2:27 PM To: u...@spark.incubator.apache.org Subject: Spark streaming updating a large window more frequently Hi, I am pretty new to spark/spark_streaming so please excuse my naivety. I have streaming event stream that is timestamped and I would like to aggregate it into, let's say, hourly buckets. Now the simple answer is to use a window operation with window length of 1 hr and sliding interval of 1hr. But this sort of doesn't exactly work: 1. The time boundaries aren't exactly perfect. i.e. the process/stream aggreagation may get started at the middle of the hour so the 1st hour may actually be less than 1 hour long and then subsequent hours should be aligned to the next hour. 2. The If I understand this correctly, the above method would mean that all my data is "collected" for 1 hour and then summarised. Though correct, how do I get the aggregations to occur more frequently than that. Something like "aggregate these events into hourly buckets updating it every 5 seconds". I would really appreciate pointers to code samples or some blogs that could help me identify best practices. -- Ankur Chauhan --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org