You can use "reduceByKeyAndWindow", e.g., val lines = ssc.socketTextStream("localhost", 9999) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWindow((x: Int, y: Int) => x + y, Seconds(60), Seconds(60)) wordCounts.print()
On Wed, Dec 30, 2015 at 12:00 PM, Soumitra Johri < soumitra.siddha...@gmail.com> wrote: > Hi, in the example : > https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala > > the streaming frequency is 1 seconds however I do not want to print the > contents of the word-counts every minute and resent the word counts again > back to 0 every minute. How can I do that ? > > I have to print per minute work counts with streaming frequency of 1 > second. I though of using scala schedulers but then there can be > concurrency issues. > > My algorithm is as follows : > > 1. Read the words every 1 second > 2. Do cumulative work count for 60 seconds > 3. After the end of every 60 second (1 minute ) print the workcounts > and resent the counters to zero. > > Any help would be appreciated! > > Thanks > > Warm Regards >