GitHub user coderxiang opened a pull request:
https://github.com/apache/spark/pull/2667
SPARK-3568 [mllib] add ranking metrics
Add common metrics for ranking algorithms
(http://www-nlp.stanford.edu/IR-book/), including:
- Mean Average Precision
- Precision@n: top-n precision
- Discounted cumulative gain (DCG) and NDCG
The following methods and the corresponding tests are implemented:
```
class RankingMetrics(predictionAndLabels: RDD[(Array[Double],
Array[Double])]) {
/* Returns the precsion@k for each query */
lazy val precAtK: RDD[Array[Double]]
/* Returns the average precision for each query */
lazy val avePrec: RDD[Double]
/*Returns the mean average precision (MAP) of all the queries*/
lazy val meanAvePrec: Double
/*Returns the normalized discounted cumulative gain for each query */
lazy val ndcg: RDD[Double]
/* Returns the mean NDCG of all the queries */
lazy val meanNdcg: Double
}
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/coderxiang/spark rankingmetrics
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2667.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2667
----
commit 3a5a6ffdb036f8432911184920193a4b8a007084
Author: coderxiang <[email protected]>
Date: 2014-10-06T05:28:05Z
add ranking metrics
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]