Hello all,

Consider following two code blocks:

val ssc = new StreamingContext(sparkConfig, Seconds(2))
val stream = KafkaUtils.createDirectStream(...)

a) stream.filter(filterFunc).count().foreachRDD(rdd =>
println(rdd.collect()))
b) stream.filter(filterFunc).window(Seconds(60),
Seconds(60)).count().foreachRDD(rdd => println(rdd.collect()))

I have observed that in case
a) the UI behaves correctly and numbers reported for each batch are correct.
b) UI reports numbers every 60 seconds but the batch-id/input-size etc are
for the 2 sec batch after every 60 seconds i.e. 30th batch, 60th batch etc.
These numbers become totally useless, infact confusing in this case though
the delay/processing-time numbers are still helpful.

Is someone working on a fix to show aggregated numbers which will be more
useful?

--
Thanks
Jatin

Reply via email to