I am implementing aggregation using spark streaming and kafka. My batch and window size are same. And the aggregated data is persisted in Cassandra.
I want to aggregate for fixed time windows - 5:00, 5:05, 5:10, ... But we cannot control when to run streaming job, we only get to specify the batch interval. So the problem is - lets say if streaming job starts at 5:02, then I will get results at 5:07, 5:12, etc. and not what I want. Any suggestions? thanks, Firdousi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Fixed-time-aggregation-handling-driver-failures-tp25982.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
