I am currently using updateStateByKey (which as you pointed out allows the introduction of an initial RDD) to introduce an initial RDD to my window counting function. I was hoping to essentially seed the widow state in startup without the use of updateStateByKey to avoid the associated cost.
Is there an alternative method to initialize state? InputQueueStream joined to window would seem to work, but InputQueueStream does not allow checkpointing Sent from Outlook Mail From: Tathagata Das Sent: Sunday, November 22, 2015 8:01 PM To: Bryan Cc: user Subject: Re: Initial State There is a way. Please see the scala docs. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions The first version of updateStateByKey has the parameter "initialRDD" On Fri, Nov 20, 2015 at 6:52 PM, Bryan <bryan.jeff...@gmail.com> wrote: All, Is there a way to introduce an initial RDD without doing updateStateByKey? I have an initial set of counts, and the algorithm I am using requires that I accumulate additional counts from streaming data, age off older counts, and make some calculations on them. The accumulation of counts uses reduceByKeyAndWindow. Is there another method to seed in the initial counts beyond updateStateByKey? Regards, Bryan Jeffrey