Hi, I'm trying to figure out how to constantly update, say, the 95th percentile of a set of data through Spark Streaming. I'm not sure how to order the dataset though, and while I can find percentiles in regular Spark, I can't seem to figure out how to get that to transfer over to Spark Streaming. My data is inputted as a key-value pair separated by a comma, and I only want to use the values to figure out percentiles. Can anyone help with this? Here's what I have for finding percentiles in regular Spark:
val sorted = textFile.map(line => line.split(",")).map(kvp => (kvp(1)->kvp(0))).sortByKey(true) val rank = 0.9 * sorted.count() val flatten = sorted.take(sorted.count().toInt) val percentile = if(rank.isInstanceOf[Int]) {(flatten(rank.toInt-1)._1.toDouble + flatten(rank.toInt)._1.toDouble)/2.0} else flatten(rank.toInt)._1.toDouble println(percentile) Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-percentile-from-Spark-Streaming-tp12040.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org