What is the recommended way to store state across RDDs?

Adrian Mocanu Mon, 28 Apr 2014 09:45:07 -0700

What is the recommended way to store state across RDDs as you traverse a 
DStream and go from 1 RDD to another?


Consider a trivial example of moving average. Between RDDs should the average 
be saved in a cache (ie redis) or is there another globar var type available in 
Spark? Accumulators are only available in the driver so they're out of the 
question.

     globalVar savedAverage=0
     stream1.transform(rdd=>{
        val movingAverage= new MovingAverage(savedAverage)
        rdd.map(x=>(x, movingAverage.add(x) ))
        savedAverage= movingAverage.getCurrentAverage
      })

-A

What is the recommended way to store state across RDDs?

Reply via email to