You have to import StreamingContext._ to enable groupByKey operations on DStreams. After importing that you can apply groupByKey on any DStream, that is a DStream of key-value pairs (e.g. DStream[(String, Int)]) . The data in each pair RDDs will be grouped by the first element in the tuple as the grouping element.
TD On Mon, Jul 14, 2014 at 10:59 AM, srinivas <kusamsrini...@gmail.com> wrote: > hi > I am new to spark and scala and I am trying to do some aggregations on > json file stream using Spark Streaming. I am able to parse the json string > and it is converted to map(id -> 123, name -> srini, mobile -> 12324214, > score -> 123, test_type -> math) now i want to use GROUPBY function on each > student map data and wanted to do some aggregations on scores. Here is my > main function > val Array(zkQuorum, group, topics, numThreads) = args > val sparkConf = new SparkConf().setAppName("KafkaWordCount") > val ssc = new StreamingContext(sparkConf, Seconds(10)) > // ssc.checkpoint("checkpoint") > > val topicpMap = topics.split(",").map((_,numThreads.toInt)).toMap > val lines = KafkaUtils.createStream(ssc, zkQuorum, group, > topicpMap).map(_._2) > val jsonf = > > lines.map(JSON.parseFull(_)).map(_.get.asInstanceOf[scala.collection.immutable.Map[String, > Any]]) > > > jsonf.print() > > ssc.start() > ssc.awaitTermination() > } > > Can anyone please Let me know how to use groupby function..thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >