The jobs depend on the number of output operations (print, foreachRDD,
saveAs*Files) and the number of RDD actions in those output operations.
For example:
dstream1.foreachRDD { rdd => rdd.count }// ONE Spark job per batch
dstream1.foreachRDD { rdd => { rdd.count ; rdd.count } } // TWO Spark j
Hi
I am executing a streaming wordcount with kafka
with one test topic with 2 partition
my cluster have three spark executors
Each batch is of 10 sec
for every batch(ex below * batch time 02:51:00*) I see 3 entry in spark UI
, as shown below below
my questions:-
1) As label says jobId for first