Thanks, but I still don’t get it.
I have used groupBy to group data by userID, and for each ID, I need to get the 
statistic information.

Best
Frank

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, October 20, 2015 3:12 PM
To: ChengBo
Cc: user
Subject: Re: Get statistic result from RDD

Your mapValues can emit a tuple. If p(0) is between 0 and 5, first component of 
tuple would be 1, second being 0.
If p(0) is 6 or 7, first component of tuple would be 0, second being 1.

You can use reduceByKey to sum up corresponding component.

On Tue, Oct 20, 2015 at 1:33 PM, Shepherd 
<cheng...@huawei.com<mailto:cheng...@huawei.com>> wrote:
Hi all,

I am really newie in Spark and Scala.
I cannot get the statistic result from a RDD. Is someone could help me on
this?
Current code is as follows:

/import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

val webFile = sc.textFile("/home/Dataset/web_info.csv")
webFile.cache()
val webItem = webFile.map(line => line.split(","))
val webEachRDD = webItem.map(p => (p(0).toLong, p(1).toLong, p(2).toLong,
p(3), p(5))) //Int, Int, Int, String, String; p(3) here is user ID, and each
user ID wil have multiple rows.

val webGroup = webEachRDD.groupBy(_._4)

val res = webGroup.mapValues(v => {
        ....
        (wkd.count, wknd.count)
})

/How can I write the webGroup.mapValues, so that I could each user ID's
statistic information.
For example: p(0) is an int between 0 to 7.
I wish to get the result for each userID, how many 0 to 5 in p(0), and how
many 6 to 7 in p(0).
In the final result, each row represents each userID's statistic result.

Thanks a lot. I really appreciate it.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Get-statistic-result-from-RDD-tp25147.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Reply via email to