You can use "reduceByKeyAndWindow", e.g.,

    val lines = ssc.socketTextStream("localhost", 9999)
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWindow((x: Int,
y: Int) => x + y, Seconds(60), Seconds(60))
    wordCounts.print()


On Wed, Dec 30, 2015 at 12:00 PM, Soumitra Johri <
soumitra.siddha...@gmail.com> wrote:

> Hi, in the example :
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala
>
> the streaming frequency is 1 seconds however I do not want to print the
> contents of the word-counts every minute and resent the word counts again
> back to 0 every minute. How can I do that ?
>
> I have to print per minute work counts with streaming frequency of 1
> second. I though of using scala schedulers but then there can be
> concurrency issues.
>
> My algorithm is as follows :
>
>    1. Read the words every 1 second
>    2. Do cumulative work count for 60 seconds
>    3. After the end of every 60 second (1 minute ) print the workcounts
>    and resent the counters to zero.
>
> Any help would be appreciated!
>
> Thanks
>
> Warm Regards
>

Reply via email to