I am currently using updateStateByKey (which as you pointed out allows the 
introduction of an initial RDD) to introduce an initial RDD to my window 
counting function. I was hoping to essentially seed the widow state in startup 
without the use of updateStateByKey to avoid the associated cost.

Is there an alternative method to initialize state?

InputQueueStream joined to window would seem to work, but InputQueueStream does 
not allow checkpointing

Sent from Outlook Mail



From: Tathagata Das
Sent: Sunday, November 22, 2015 8:01 PM
To: Bryan
Cc: user
Subject: Re: Initial State


There is a way. Please see the scala docs.
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions
 
The first version of updateStateByKey has the parameter "initialRDD"

On Fri, Nov 20, 2015 at 6:52 PM, Bryan <bryan.jeff...@gmail.com> wrote:
All,

Is there a way to introduce an initial RDD without doing updateStateByKey? I 
have an initial set of counts, and the algorithm I am using requires that I 
accumulate additional counts from streaming data, age off older counts, and 
make some calculations on them. The accumulation of counts uses 
reduceByKeyAndWindow. Is there another method to seed in the initial counts 
beyond updateStateByKey?

Regards,

Bryan Jeffrey



Reply via email to