What is the recommended way to store state across RDDs as you traverse a 
DStream and go from 1 RDD to another?

Consider a trivial example of moving average. Between RDDs should the average 
be saved in a cache (ie redis) or is there another globar var type available in 
Spark? Accumulators are only available in the driver so they're out of the 
question.

     globalVar savedAverage=0
     stream1.transform(rdd=>{
        val movingAverage= new MovingAverage(savedAverage)
        rdd.map(x=>(x, movingAverage.add(x) ))
        savedAverage= movingAverage.getCurrentAverage
      })

-A

Reply via email to