If I understand you correctly, you need Window duration of 1 hour and sliding
interval of 5 seconds.
Mohammed
-Original Message-
From: Ankur Chauhan [mailto:achau...@brightcove.com]
Sent: Friday, May 8, 2015 2:27 PM
To: u...@spark.incubator.apache.org
Subject: Spark streaming updating a large window more frequently
Hi,
I am pretty new to spark/spark_streaming so please excuse my naivety. I have
streaming event stream that is timestamped and I would like to aggregate it
into, let's say, hourly buckets. Now the simple answer is to use a window
operation with window length of 1 hr and sliding interval of 1hr. But this sort
of doesn't exactly work:
1. The time boundaries aren't exactly perfect. i.e. the process/stream
aggreagation may get started at the middle of the hour so the 1st hour may
actually be less than 1 hour long and then subsequent hours should be aligned
to the next hour.
2. The If I understand this correctly, the above method would mean that all my
data is collected for 1 hour and then summarised. Though correct, how do I
get the aggregations to occur more frequently than that. Something like
aggregate these events into hourly buckets updating it every 5 seconds.
I would really appreciate pointers to code samples or some blogs that could
help me identify best practices.
-- Ankur Chauhan
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org