I don't know if this is the most efficient way to do that, but you can use a sliding window that is bigger than your aggregation period and filter only for the messages inside the period.
Remember that to work with the reduceByKeyAndWindow you need to associate each row with the time key, in your case "yyyyMMddhhmm". http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations Hope it helps, Regards, Ricardo On Sat, Jan 16, 2016 at 1:13 AM, ffarozan [via Apache Spark User List] < ml-node+s1001560n25982...@n3.nabble.com> wrote: > I am implementing aggregation using spark streaming and kafka. My batch > and window size are same. And the aggregated data is persisted in > Cassandra. > > I want to aggregate for fixed time windows - 5:00, 5:05, 5:10, ... > > But we cannot control when to run streaming job, we only get to specify > the batch interval. > > So the problem is - lets say if streaming job starts at 5:02, then I will > get results at 5:07, 5:12, etc. and not what I want. > > Any suggestions? > > thanks, > Firdousi > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Fixed-time-aggregation-handling-driver-failures-tp25982.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmljYXJkby5wYWl2YUBjb3JwLmdsb2JvLmNvbXwxfDQ1MDcxMTc2Mw==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Ricardo Paiva Big Data / Semântica 2483-6432 *globo.com* <http://www.globo.com> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Fixed-time-aggregation-handling-driver-failures-tp25982p25990.html Sent from the Apache Spark User List mailing list archive at Nabble.com.