Hi, I was wondering on how foreachRDD would run. Specifically, let's say I do something like (nothing real, just for understanding):
var df = ??? var counter = 0 dstream.foreachRDD { rdd: RDD[Long] => { val df2 = rdd.toDF(...) df = df.union(df2) counter += 1 if (counter > 100) { ssc.stop() } } Would this guarantee that df would be a union of the first 100 micro batches? i.e. is foreachRDD guaranteed to run on the driver, updating everything locally (as opposed to lazily updating stuff or running on a worker)? In simple tests this appears to work correctly but I am planning to do some more complex things (including using multiple dstreams)? Thanks, Assaf.