Thanks for the reply. Are there plans to allow this runtime interactions with a dstream context? From the surface they seem doable. What is preventing this to work?
Also... I implemented the modifiable windowdstream and it seemed to work good. Thanks for the pointer. Gino B. > On Jun 2, 2014, at 7:14 PM, Tathagata Das <[email protected]> wrote: > > Currently Spark Streaming does not support addition/deletion/modification of > DStream after the streaming context has been started. > Nor can you restart a stopped streaming context. > Also, multiple spark contexts (and therefore multiple streaming contexts) > cannot be run concurrently in the same JVM. > > To change the window duration, I would one of the following. > > 1. Stop the previous streaming context, create a new streaming context, and > setup the dstreams once again with the new window duration > 2. Create a custom DStream, say DynamicWindowDStream. Take a look at how > WindowedDStream is implemented (pretty simple, just a union over RDDs across > time). That should allow you to modify the window duration. However, do make > sure you have a maximum window duration that you will ever reach, and make > sure you define parentRememberDuration as a "rememberDuration + > maxWindowDuration". That fields defines which RDDs can be forgotten, so is > sensitive to the window duration. Then you have to take care of correctly > (atomically, etc.) modifying the window duration as per your requirements. > > Happy streaming! > > TD > > > > >> On Mon, Jun 2, 2014 at 2:46 PM, lbustelo <[email protected]> wrote: >> This is a general question about whether Spark Streaming can be interactive >> like batch Spark jobs. I've read plenty of threads and done my fair bit of >> experimentation and I'm thinking the answer is NO, but it does not hurt to >> ask. >> >> More specifically, I would like to be able to do: >> 1. Add/Remove steps to the Streaming Job >> 2. Modify Window durations >> 3. Stop and Restart context. >> >> I've tried the following: >> >> 1. Modify the DStream after it has been started… BOOM! Exceptions >> everywhere. >> >> 2. Stop the DStream, Make modification, Start… NOT GOOD :( In 0.9.0 I was >> getting deadlocks. I also tried 1.0.0 and it did not work. >> >> 3. Based on information provided here >> <http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-tp3347p3371.html> >> , I was been able to prototype modifying the RDD computation within a >> forEachRDD. That is nice, but you are then bounded to the specified batch >> size. That got me to wanting to modify Window durations. Is changing the >> Window duration possible? >> >> 4. Tried running multiple streaming context from within a single Driver >> application and got several exceptions. The first one was bind exception on >> the web port. Then once the app started getting run (cores were taken but >> 1st job) it did not run correctly. A lot of >> "akka.pattern.AskTimeoutException: Timed out" >> . >> >> I've tried my experiments in 0.9.0, 0.9.1 and 1.0.0 running on Standalone >> Cluster setup. >> Thanks in advanced >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Interactive-modification-of-DStreams-tp6740.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >
