Hello all, Consider following two code blocks:
val ssc = new StreamingContext(sparkConfig, Seconds(2)) val stream = KafkaUtils.createDirectStream(...) a) stream.filter(filterFunc).count().foreachRDD(rdd => println(rdd.collect())) b) stream.filter(filterFunc).window(Seconds(60), Seconds(60)).count().foreachRDD(rdd => println(rdd.collect())) I have observed that in case a) the UI behaves correctly and numbers reported for each batch are correct. b) UI reports numbers every 60 seconds but the batch-id/input-size etc are for the 2 sec batch after every 60 seconds i.e. 30th batch, 60th batch etc. These numbers become totally useless, infact confusing in this case though the delay/processing-time numbers are still helpful. Is someone working on a fix to show aggregated numbers which will be more useful? -- Thanks Jatin