bq. and check if 5 minutes have passed What if the duration for the window is longer than 5 minutes ?
Cheers On Wed, Sep 16, 2015 at 1:25 PM, Adrian Tanase <atan...@adobe.com> wrote: > If you don't need the counts in betweem the DB writes, you could simply > use a 5 min window for the updateStateByKey and use foreachRdd on the > resulting DStream. > > Even simpler, you could use reduceByKeyAndWindow directly. > > Lastly, you could keep a variable on the driver and check if 5 minutes > have passed > in foreachRdd on the original DStream, even if the batch duration is > shorter. > > Also, remember to cleanup the state in your updateStateByKey function or > it will grow unbounded. I still believe one of the builtin ByKey functions > are a simpler strategy. > > hope this helps. > > -adrian > > Sent from my iPhone > > > On 16 Sep 2015, at 22:33, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > > > > Hello. > > > > I have a streaming job that is processing data. I process a stream of > events, taking actions when I see anomalous events. I also keep a count > events observed using updateStateByKey to maintain a map of type to count. > I would like to periodically (every 5 minutes) write the results of my > counts to a database. Is there a built in mechanism or established pattern > to execute periodic jobs in spark streaming? > > > > Regards, > > > > Bryan Jeffrey > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >