1. udpateStateByKey should be called on all keys even if there is not data corresponding to that key. There is a unit test for that. https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala#L337
2. I am increasing the priority for this. Off the top of my head, this is easy to fix, but hard to test reliably test in a unit test. Will fix it soon after Spark 1.1 release. TD On Fri, Aug 1, 2014 at 7:37 AM, RodrigoB <rodrigo.boav...@aspect.com> wrote: > Hi TD, > > I've also been fighting this issue only to find the exact same solution you > are suggesting. > Too bad I didn't find either the post or the issue sooner. > > I'm using a 1 second batch with N amount of kafka events (1 to 1 with the > state objects) per batch and only calling the updatestatebykey function. > > This is my interpretation, please correct me if needed: > Because of Spark’s lazy computation the RDDs weren’t being updated as > expected on the batch interval execution. The assumption was that as long > as > I have a streaming batch run (with or without new messages), I should get > updated RDDs, which was not happening. We only get updateStateByKey calls > for objects which got events or that are forced through an output function > to compute. I did not make further test to confirm this, but that's the > given impression. > > This doesn't fit our requirements as we want to do duration updates based > on > the batch interval execution...so I had to force the computation of all the > objects through the ForeachRDD function. > > I will also appreciate if the priority can be increased to the issue. I > assume the ForeachRDD is additional unnecessary resource allocation > (although I'm not sure how much) as opposite to doing it somehow by default > on batch interval execution. > > tnks, > Rod > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/streaming-window-not-behaving-as-advertised-v1-0-1-tp10453p11168.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >