With Spark Streaming, I am maintaining a state (updateStateByKey every 30s) and 
emitting to file parts of that state that have been closed every 5 minutes, but 
only care about the last state collected.
In 5m, there will be 10 updateStateByKey iterations called.

For example:
…
  val ssc =  new StreamingContext(sc, Seconds(30))

    val expiredState = state
    .filter(_._2.expired == true)
    .window(windowDuration = Seconds(30), slideDuration)
…

When I go to emit, I want to update a Boolean flag in my collection of state 
that says it has been collected, so that the next time state is updated I can 
remove what has been emitted.

Is there a way to do this or maybe a better pattern or approach to solve this 
problem?

Hopefully I have given enough information to explain the use case.

Thanks,
Robert


Reply via email to