Please take a look at: examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
Cheers On Tue, Oct 20, 2015 at 3:18 PM, ChengBo <[email protected]> wrote: > Thanks, but I still don’t get it. > > I have used groupBy to group data by userID, and for each ID, I need to > get the statistic information. > > > > Best > > Frank > > > > *From:* Ted Yu [mailto:[email protected]] > *Sent:* Tuesday, October 20, 2015 3:12 PM > *To:* ChengBo > *Cc:* user > *Subject:* Re: Get statistic result from RDD > > > > Your mapValues can emit a tuple. If p(0) is between 0 and 5, first > component of tuple would be 1, second being 0. > > If p(0) is 6 or 7, first component of tuple would be 0, second being 1. > > > You can use reduceByKey to sum up corresponding component. > > > > On Tue, Oct 20, 2015 at 1:33 PM, Shepherd <[email protected]> wrote: > > Hi all, > > I am really newie in Spark and Scala. > I cannot get the statistic result from a RDD. Is someone could help me on > this? > Current code is as follows: > > /import org.apache.spark.SparkConf > import org.apache.spark.SparkContext > import org.apache.spark.SparkContext._ > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > > val webFile = sc.textFile("/home/Dataset/web_info.csv") > webFile.cache() > val webItem = webFile.map(line => line.split(",")) > val webEachRDD = webItem.map(p => (p(0).toLong, p(1).toLong, p(2).toLong, > p(3), p(5))) //Int, Int, Int, String, String; p(3) here is user ID, and > each > user ID wil have multiple rows. > > val webGroup = webEachRDD.groupBy(_._4) > > val res = webGroup.mapValues(v => { > .... > (wkd.count, wknd.count) > }) > > /How can I write the webGroup.mapValues, so that I could each user ID's > statistic information. > For example: p(0) is an int between 0 to 7. > I wish to get the result for each userID, how many 0 to 5 in p(0), and how > many 6 to 7 in p(0). > In the final result, each row represents each userID's statistic result. > > Thanks a lot. I really appreciate it. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Get-statistic-result-from-RDD-tp25147.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > >
