Hi, Have you thought of other alternatives like collecting data in a database (over 24 hours period)?
I mean do you require reports of 5 min interval *after 24 hours data collection* from t0, t0+5m, t0+10 min? You can only do so after collecting data then you can partition your table into 5 minutes timeslot? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 9 May 2016 at 08:15, 李明伟 <kramer2...@126.com> wrote: > Thanks Mich > > I guess I did not make my question clear enough. I know the terms like > interval or window. I also know how to use them. The problem is that in my > case, I need to set the window to cover data for 24 hours or 1 hours. I am > not sure if it is a good way because the window is just too big. I am > expecting my program to be a long running service. So I am worrying the > stability of the program. > > > > > > > > At 2016-05-09 15:01:57, "Mich Talebzadeh" <mich.talebza...@gmail.com> > wrote: > > ok terms for Spark Streaming > > "Batch interval" is the basic interval at which the system with receive > the data in batches. > This is the interval set when creating a StreamingContext. For example, if > you set the batch interval as 300 seconds, then any input DStream will > generate RDDs of received data at 300 seconds intervals. > A window operator is defined by two parameters - > - WindowDuration / WindowsLength - the length of the window > - SlideDuration / SlidingInterval - the interval at which the window will > slide or move forward > > > Ok so your batch interval is 5 minutes. That is the rate messages are > coming in from the source. > > Then you have these two params > > // window length - The duration of the window below that must be multiple > of batch interval n in = > StreamingContext(sparkConf, Seconds(n)) > val windowLength = x = m * n > // sliding interval - The interval at which the window operation is > performed in other words data is collected within this "previous interval' > val slidingInterval = y l x/y = even number > > Both the window length and the slidingInterval duration must be multiples > of the batch interval, as received data is divided into batches of duration > "batch interval". > > If you want to collect 1 hour data then windowLength = 12 * 5 * 60 > seconds > If you want to collect 24 hour data then windowLength = 24 * 12 * 5 * 60 > > You sliding window should be set to batch interval = 5 * 60 seconds. In > other words that where the aggregates and summaries come for your report. > > What is your data source here? > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 9 May 2016 at 04:19, kramer2...@126.com <kramer2...@126.com> wrote: > >> We have some stream data need to be calculated and considering use spark >> stream to do it. >> >> We need to generate three kinds of reports. The reports are based on >> >> 1. The last 5 minutes data >> 2. The last 1 hour data >> 3. The last 24 hour data >> >> The frequency of reports is 5 minutes. >> >> After reading the docs, the most obvious way to solve this seems to set >> up a >> spark stream with 5 minutes interval and two window which are 1 hour and 1 >> day. >> >> >> But I am worrying that if the window is too big for one day and one hour. >> I >> do not have much experience on spark stream, so what is the window length >> in >> your environment? >> >> Any official docs talking about this? >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-big-the-spark-stream-window-could-be-tp26899.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > >