Thanks, but I still don’t get it. I have used groupBy to group data by userID, and for each ID, I need to get the statistic information.
Best Frank From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, October 20, 2015 3:12 PM To: ChengBo Cc: user Subject: Re: Get statistic result from RDD Your mapValues can emit a tuple. If p(0) is between 0 and 5, first component of tuple would be 1, second being 0. If p(0) is 6 or 7, first component of tuple would be 0, second being 1. You can use reduceByKey to sum up corresponding component. On Tue, Oct 20, 2015 at 1:33 PM, Shepherd <cheng...@huawei.com<mailto:cheng...@huawei.com>> wrote: Hi all, I am really newie in Spark and Scala. I cannot get the statistic result from a RDD. Is someone could help me on this? Current code is as follows: /import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val webFile = sc.textFile("/home/Dataset/web_info.csv") webFile.cache() val webItem = webFile.map(line => line.split(",")) val webEachRDD = webItem.map(p => (p(0).toLong, p(1).toLong, p(2).toLong, p(3), p(5))) //Int, Int, Int, String, String; p(3) here is user ID, and each user ID wil have multiple rows. val webGroup = webEachRDD.groupBy(_._4) val res = webGroup.mapValues(v => { .... (wkd.count, wknd.count) }) /How can I write the webGroup.mapValues, so that I could each user ID's statistic information. For example: p(0) is an int between 0 to 7. I wish to get the result for each userID, how many 0 to 5 in p(0), and how many 6 to 7 in p(0). In the final result, each row represents each userID's statistic result. Thanks a lot. I really appreciate it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Get-statistic-result-from-RDD-tp25147.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>