Hi,

I'm trying to figure out how to constantly update, say, the 95th percentile
of a set of data through Spark Streaming. I'm not sure how to order the
dataset though, and while I can find percentiles in regular Spark, I can't
seem to figure out how to get that to transfer over to Spark Streaming. My
data is inputted as a key-value pair separated by a comma, and I only want
to use the values to figure out percentiles. Can anyone help with this?
Here's what I have for finding percentiles in regular Spark:

val sorted = textFile.map(line => line.split(",")).map(kvp =>
(kvp(1)->kvp(0))).sortByKey(true)
val rank = 0.9 * sorted.count()
val flatten = sorted.take(sorted.count().toInt)
val percentile = if(rank.isInstanceOf[Int])
{(flatten(rank.toInt-1)._1.toDouble + flatten(rank.toInt)._1.toDouble)/2.0}
else flatten(rank.toInt)._1.toDouble
println(percentile)

Thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Getting-percentile-from-Spark-Streaming-tp12040.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to