With Spark Streaming, I am maintaining a state (updateStateByKey every 30s) and emitting to file parts of that state that have been closed every 5 minutes, but only care about the last state collected. In 5m, there will be 10 updateStateByKey iterations called.
For example: … val ssc = new StreamingContext(sc, Seconds(30)) val expiredState = state .filter(_._2.expired == true) .window(windowDuration = Seconds(30), slideDuration) … When I go to emit, I want to update a Boolean flag in my collection of state that says it has been collected, so that the next time state is updated I can remove what has been emitted. Is there a way to do this or maybe a better pattern or approach to solve this problem? Hopefully I have given enough information to explain the use case. Thanks, Robert