
I was wondering on how foreachRDD would run.
Specifically, let's say I do something like (nothing real, just for 

var df = ???
var counter = 0

dstream.foreachRDD {
    rdd: RDD[Long] => {
      val df2 = rdd.toDF(...)
      df = df.union(df2)
     counter += 1
    if (counter > 100) {

Would this guarantee that df would be a union of the first 100 micro batches?
i.e. is foreachRDD guaranteed to run on the driver, updating everything locally 
(as opposed to lazily updating stuff or running on a worker)?
In simple tests this appears to work correctly but I am planning to do some 
more complex things (including using multiple dstreams)?


Reply via email to