One, I think, you should take this to the spark developer list. Two, I suspect broadcast variables aren't the best solution for the use case, you describe. Maybe an in-memory data/object/file store like tachyon is a better fit.
Thanks, Tim On Tue, May 2, 2017 at 11:56 AM, Nipun Arora <nipunarora2...@gmail.com> wrote: > Hi All, > > To support our Spark Streaming based anomaly detection tool, we have made > a patch in Spark 1.6.2 to dynamically update broadcast variables. > > I'll first explain our use-case, which I believe should be common to > several people using Spark Streaming applications. Broadcast variables are > often used to store values "machine learning models", which can then be > used on streaming data to "test" and get the desired results (for our case > anomalies). Unfortunately, in the current spark, broadcast variables are > final and can only be initialized once before the initialization of the > streaming context. Hence, if a new model is learned the streaming system > cannot be updated without shutting down the application, broadcasting > again, and restarting the application. Our goal was to re-broadcast > variables without requiring a downtime of the streaming service. > > The key to this implementation is a live re-broadcastVariable() interface, > which can be triggered in between micro-batch executions, without any > re-boot required for the streaming application. At a high level the task is > done by re-fetching broadcast variable information from the spark driver, > and then re-distribute it to the workers. The micro-batch execution is > blocked while the update is made, by taking a lock on the execution. We > have already tested this in our prototype deployment of our anomaly > detection service and can successfully re-broadcast the broadcast variables > with no downtime. > > We would like to integrate these changes in spark, can anyone please let > me know the process of submitting patches/ new features to spark. Also. I > understand that the current version of Spark is 2.1. However, our changes > have been done and tested on Spark 1.6.2, will this be a problem? > > Thanks > Nipun > -- -- Thanks, Tim