Do you mean that you want a continuously updated count as more
events/records are received in the DStream (remember, DStream is a
continuous stream of data)? Assuming that is what you want, you can use a
global counter

var globalCount = 0L

dstream.count().foreachRDD(rdd => { globalCount += rdd.first() } )

This globalCount variable will reside in the driver and will keep being
updated after every batch.

TD


On Thu, Aug 7, 2014 at 10:16 PM, Soumitra Kumar <kumar.soumi...@gmail.com>
wrote:

> Hello,
>
> I want to count the number of elements in the DStream, like RDD.count() .
> Since there is no such method in DStream, I thought of using DStream.count
> and use the accumulator.
>
> How do I do DStream.count() to count the number of elements in a DStream?
>
> How do I create a shared variable in Spark Streaming?
>
> -Soumitra.
>

Reply via email to