Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35587655
--- Diff: docs/mllib-metrics.md ---
@@ -0,0 +1,1464 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Evaluation Metrics
+---
+
+* Table of contents
+{:toc}
+
+
+## Algorithm Metrics
+
+Spark's MLlib comes with a number of machine learning algorithms that can
be used to learn from and make predictions
+on data. When applying these algorithms, there is a need to evaluate their
performance on certain criteria, depending
+on the application and its requirements. Spark's MLlib also provides a
suite of metrics for the purpose of evaluating the
+performance of its algorithms.
+
+Specific machine learning algorithms fall under broader types of machine
learning applications like classification,
+regression, clustering, etc. Each of these types have well established
metrics for performance evaluation and those
+metrics that are currently available in Spark's MLlib are detailed in this
section.
+
+## Binary Classification
+
+[Binary classifiers](https://en.wikipedia.org/wiki/Binary_classification)
are used to separate the elements of a given
+dataset into one of two possible groups (e.g. fraud or not fraud) and is a
special case of multiclass classification.
+Most binary classification metrics can be generalized to multiclass
classification metrics.
+
+<table class="table">
+ <thead>
+ <tr><th>Metric</th><th>Definition</th></tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Precision (Postive Predictive Value)</td>
+ <td>$PPV=\frac{TP}{TP + FP}$</td>
+ </tr>
+ <tr>
+ <td>Recall (True Positive Rate)</td>
+ <td>$TPR=\frac{TP}{P}=\frac{TP}{TP + FN}$</td>
+ </tr>
+ <tr>
+ <td>F-measure</td>
+ <td>$F(\beta) = \left(1 + \beta^2\right) \cdot \left(\frac{PPV \cdot
TPR}
+ {\beta^2 \cdot PPV + TPR}\right)$</td>
+ </tr>
+ <tr>
+ <td>Receiver Operating Characteristic (ROC)</td>
+ <td>$FPR(T)=\int^\infty_{T} P_0(T)\,dT \\ TPR(T)=\int^\infty_{T}
P_1(T)\,dT$</td>
+ </tr>
+ <tr>
+ <td>Area Under ROC Curve</td>
+ <td>$AUROC=\int^1_{0} \frac{TP}{P} d\left(\frac{FP}{N}\right)$</td>
+ </tr>
+ <tr>
+ <td>Area Under Precision-Recall Curve</td>
+ <td>$AUPRC=\int^1_{0} \frac{TP}{TP+FP}
d\left(\frac{TP}{P}\right)$</td>
+ </tr>
+ </tbody>
+</table>
+
+
+**Examples**
+
+<div class="codetabs">
+The following code snippets illustrate how to load a sample dataset, train
a binary classification algorithm on the
+data, and evaluate the performance of the algorithm by several binary
evaluation metrics.
+
+<div data-lang="scala" markdown="1">
+
+{% highlight scala %}
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
+
+// Load training data in LIBSVM format
+val data = MLUtils.loadLibSVMFile(sc,
"data/mllib/sample_binary_classification_data.txt")
+
+// Split data into training (60%) and test (40%)
+val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
+val training = splits(0).cache()
+val test = splits(1)
+
+// Run training algorithm to build the model
+val model = new LogisticRegressionWithLBFGS()
+ .setNumClasses(2)
+ .run(training)
+
+// Clear the prediction threshold so the model will return probabilities
+model.clearThreshold
+
+// Compute raw scores on the test set
+val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
+ val prediction = model.predict(features)
+ (prediction, label)
+}
+
+// Instantiate metrics object
+val metrics = new BinaryClassificationMetrics(predictionAndLabels)
+
+// Precision by threshold
+val precision = metrics.precisionByThreshold
+precision.foreach(x => printf("Threshold: %1.2f, Precision: %1.2f\n",
x._1, x._2))
+
+// Recall by threshold
+val recall = metrics.precisionByThreshold
+recall.foreach(x => printf("Threshold: %1.2f, Recall: %1.2f\n", x._1,
x._2))
+
+// Precision-Recall Curve
+val PRC = metrics.pr
+
+// F-measure
+val f1Score = metrics.fMeasureByThreshold
+f1Score.foreach(x => printf("Threshold: %1.2f, F-score: %1.2f, Beta =
1\n", x._1, x._2))
+
+val beta = 0.5
+val fScore = metrics.fMeasureByThreshold(beta)
+fScore.foreach(x => printf("Threshold: %1.2f, F-score: %1.2f, Beta =
0.5\n", x._1, x._2))
+
+// AUPRC
+val auPRC = metrics.areaUnderPR
+println("Area under precision-recall curve = " + auPRC)
+
+// Compute thresholds used in ROC and PR curves
+val thresholds = precision.map(_._1)
+
+// ROC Curve
+val roc = metrics.roc
+
+// AUROC
+val auROC = metrics.areaUnderROC
+println("Area under ROC = " + auROC)
+
+{% endhighlight %}
+
+</div>
+
+<div data-lang="java" markdown="1">
+
+{% highlight java %}
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class BinaryClassification {
+ public static void main(String[] args) {
+ SparkConf conf = new SparkConf().setAppName("Binary Classification
Metrics");
+ SparkContext sc = new SparkContext(conf);
+ String path = "data/mllib/sample_binary_classification_data.txt";
+ JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc,
path).toJavaRDD();
+
+ // Split initial RDD into two... [60% training data, 40% testing data].
+ JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[] {0.6,
0.4}, 11L);
+ JavaRDD<LabeledPoint> training = splits[0].cache();
+ JavaRDD<LabeledPoint> test = splits[1];
+
+ // Run training algorithm to build the model.
+ final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+ .setNumClasses(3)
+ .run(training.rdd());
+
+ // Compute raw scores on the test set.
+ JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
+ new Function<LabeledPoint, Tuple2<Object, Object>>() {
+ public Tuple2<Object, Object> call(LabeledPoint p) {
+ Double prediction = model.predict(p.features());
+ return new Tuple2<Object, Object>(prediction, p.label());
+ }
+ }
+ );
+
+ // Get evaluation metrics.
+ BinaryClassificationMetrics metrics = new
BinaryClassificationMetrics(predictionAndLabels.rdd());
+
+ // Precision by threshold
+ JavaRDD<Tuple2<Object, Object>> precision =
metrics.precisionByThreshold().toJavaRDD();
+ System.out.println("Precision by threshold: " + precision.toArray());
+
+ // Recall by threshold
+ JavaRDD<Tuple2<Object, Object>> recall =
metrics.recallByThreshold().toJavaRDD();
+ System.out.println("Recall by threshold: " + recall.toArray());
+
+ // F Score by threshold
+ JavaRDD<Tuple2<Object, Object>> f1Score =
metrics.fMeasureByThreshold().toJavaRDD();
+ System.out.println("F1 Score by threshold: " + f1Score.toArray());
+
+ JavaRDD<Tuple2<Object, Object>> f2Score =
metrics.fMeasureByThreshold(2.0).toJavaRDD();
+ System.out.println("F2 Score by threshold: " + f2Score.toArray());
+
+ // Precision-recall curve
+ JavaRDD<Tuple2<Object, Object>> prc = metrics.pr().toJavaRDD();
+ System.out.println("Precision-recall curve: " + prc.toArray());
+
+ // Thresholds
+ JavaRDD<Double> thresholds = precision.map(
+ new Function<Tuple2<Object, Object>, Double>() {
+ public Double call (Tuple2<Object, Object> t) {
+ return new Double(t._1().toString());
+ }
+ }
+ );
+
+ // ROC Curve
+ JavaRDD<Tuple2<Object, Object>> roc = metrics.roc().toJavaRDD();
+ System.out.println("ROC curve: " + roc.toArray());
+
+ // AUPRC
+ System.out.println("Area under precision-recall curve = " +
metrics.areaUnderPR());
+
+ // AUROC
+ System.out.println("Area under ROC = " + metrics.areaUnderROC());
+
+ // Save and load model
+ model.save(sc, "myModelPath");
+ LogisticRegressionModel sameModel = LogisticRegressionModel.load(sc,
"myModelPath");
+ }
+}
+
+{% endhighlight %}
+
+</div>
+
+<div data-lang="python" markdown="1">
+
+{% highlight python %}
+from pyspark.mllib.classification import LogisticRegressionWithLBFGS
+from pyspark.mllib.evaluation import BinaryClassificationMetrics
+from pyspark.mllib.regression import LabeledPoint
+from pyspark.mllib.util import MLUtils
+
+# Several of the methods available in scala are currently missing from
pyspark
+
+# Load training data in LIBSVM format
+data = MLUtils.loadLibSVMFile(sc,
"data/mllib/sample_binary_classification_data.txt")
+
+# Split data into training (60%) and test (40%)
+splits = data.randomSplit([0.6, 0.4], seed = 11L)
+training = splits[0].cache()
+test = splits[1]
+
+# Run training algorithm to build the model
+model = LogisticRegressionWithLBFGS.train(training)
+
+# Compute raw scores on the test set
+predictionAndLabels = test.map(lambda lp:
(float(model.predict(lp.features)), lp.label))
+
+# Instantiate metrics object
+metrics = BinaryClassificationMetrics(predictionAndLabels)
+
+# Area under precision-recall curve
+print "Area under PR = %1.2f" % metrics.areaUnderPR
+
+# Area under ROC curve
+print "Area under ROC = %1.2f" % metrics.areaUnderROC
+
+{% endhighlight %}
+
+</div>
+</div>
+
+
+## Multiclass Classification
+
+A [multiclass
classification](https://en.wikipedia.org/wiki/Multiclass_classification)
describes a classification
+problem where there are $M \gt 2$ possible labels for each data point (the
case where $M=2$ is the binary
+classification problem). For example, classifying handwriting samples to
the digits 0 to 9, having 10 possible classes.
+
+Define the class, or label, set as
+
+$$L = \{\ell_0, \ell_1, \ldots, \ell_{M-1} \} $$
+
+The true output vector $\mathbf{y}$ consists of $N$ elements
+
+$$\mathbf{y}_0, \mathbf{y}_1, \ldots, \mathbf{y}_{N-1} \in L $$
+
+A multiclass prediction algorithm generates a prediction vector
$\hat{\mathbf{y}}$ of $N$ elements
+
+$$\hat{\mathbf{y}}_0, \hat{\mathbf{y}}_1, \ldots, \hat{\mathbf{y}}_{N-1}
\in L $$
+
+For this section, a modified delta function $\hat{\delta}(x)$ will prove
useful
+
+$$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, \\ 0 &
\text{otherwise}.\end{cases}$$
+
+<table class="table">
+ <thead>
+ <tr><th>Metric</th><th>Definition</th></tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Confusion Matrix</td>
+ <td>
+ $C_{ij} = \sum_{k=0}^{N-1} \hat{\delta}(\mathbf{y}_k-\ell_i) \cdot
\hat{\delta}(\hat{\mathbf{y}}_k - \ell_j)\\ \\
+ \left( \begin{array}{ccc}
+ \sum_{k=0}^{N-1} \hat{\delta}(\mathbf{y}_k-\ell_1) \cdot
\hat{\delta}(\hat{\mathbf{y}}_k - \ell_1) & \ldots &
+ \sum_{k=0}^{N-1} \hat{\delta}(\mathbf{y}_k-\ell_1) \cdot
\hat{\delta}(\hat{\mathbf{y}}_k - \ell_N) \\
+ \vdots & \ddots & \vdots \\
+ \sum_{k=0}^{N-1} \hat{\delta}(\mathbf{y}_k-\ell_N) \cdot
\hat{\delta}(\hat{\mathbf{y}}_k - \ell_1) & \ldots &
+ \sum_{k=0}^{N-1} \hat{\delta}(\mathbf{y}_k-\ell_N) \cdot
\hat{\delta}(\hat{\mathbf{y}}_k - \ell_N)
+ \end{array} \right)$
+ </td>
+ </tr>
+ <tr>
+ <td>Overall Precision</td>
+ <td>$PPV = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1}
\hat{\delta}\left(\hat{\mathbf{y}}_i -
+ \mathbf{y}_i\right)$</td>
+ </tr>
+ <tr>
+ <td>Overall Recall</td>
+ <td>$TPR = \frac{TP}{TP + FN} = \frac{1}{N}\sum_{i=0}^{N-1}
\hat{\delta}\left(\hat{\mathbf{y}}_i -
+ \mathbf{y}_i\right)$</td>
+ </tr>
+ <tr>
+ <td>Overall F1-measure</td>
+ <td>$F1 = 2 \cdot \left(\frac{PPV \cdot TPR}
+ {PPV + TPR}\right)$</td>
+ </tr>
+ <tr>
+ <td>Precision by label</td>
+ <td>$PPV(\ell) = \frac{TP}{TP + FP} =
+ \frac{\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell)
\cdot \hat{\delta}(\mathbf{y}_i - \ell)}
+ {\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell)}$</td>
+ </tr>
+ <tr>
+ <td>Recall by label</td>
+ <td>$TPR(\ell)=\frac{TP}{P} =
+ \frac{\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell)
\cdot \hat{\delta}(\mathbf{y}_i - \ell)}
+ {\sum_{i=0}^{N-1} \hat{\delta}(\mathbf{y}_i - \ell)}$</td>
+ </tr>
+ <tr>
+ <td>F-measure by label</td>
+ <td>$F(\beta, \ell) = \left(1 + \beta^2\right) \cdot
\left(\frac{PPV(\ell) \cdot TPR(\ell)}
+ {\beta^2 \cdot PPV(\ell) + TPR(\ell)}\right)$</td>
+ </tr>
+ <tr>
+ <td>Weighted precision</td>
+ <td>$PPV_{w}= \frac{1}{N} \sum\nolimits_{\ell \in L} PPV(\ell)
+ \cdot \sum_{i=0}^{N-1} \hat{\delta}(\mathbf{y}_i-\ell)$</td>
+ </tr>
+ <tr>
+ <td>Weighted recall</td>
+ <td>$TPR_{w}= \frac{1}{N} \sum\nolimits_{\ell \in L} TPR(\ell)
+ \cdot \sum_{i=0}^{N-1} \hat{\delta}(\mathbf{y}_i-\ell)$</td>
+ </tr>
+ <tr>
+ <td>Weighted F-measure</td>
+ <td>$F_{w}(\beta)= \frac{1}{N} \sum\nolimits_{\ell \in L} F(\beta,
\ell)
+ \cdot \sum_{i=0}^{N-1} \hat{\delta}(\mathbf{y}_i-\ell)$</td>
+ </tr>
+ </tbody>
+</table>
+
+**Examples**
+
+<div class="codetabs">
+The following code snippets illustrate how to load a sample dataset, train
a multiclass classification algorithm on
+the data, and evaluate the performance of the algorithm by several
multiclass classification evaluation metrics.
+
+<div data-lang="scala" markdown="1">
+
+{% highlight scala %}
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import org.apache.spark.mllib.evaluation.MulticlassMetrics
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
+
+// Load training data in LIBSVM format
+val data = MLUtils.loadLibSVMFile(sc,
"data/mllib/sample_multiclass_classification_data.txt")
+
+// Split data into training (60%) and test (40%)
+val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
+val training = splits(0).cache()
+val test = splits(1)
+
+// Run training algorithm to build the model
+val model = new LogisticRegressionWithLBFGS()
+ .setNumClasses(3)
+ .run(training)
+
+// Compute raw scores on the test set
+val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
+ val prediction = model.predict(features)
+ (prediction, label)
+}
+
+// Instantiate metrics object
+val metrics = new MulticlassMetrics(predictionAndLabels)
+
+// Confusion matrix
+println("Confusion matrix:")
+println(metrics.confusionMatrix)
+
+// Overall Statistics
+val precision = metrics.precision
+val recall = metrics.recall // same as true positive rate
+val f1Score = metrics.fMeasure
+println("Summary Statistics")
+printf("Precision = %1.2f\n", precision)
+printf("Recall = %1.2f\n", recall)
+printf("F1 Score = %1.2f\n", f1Score)
+
+// Precision by label
+val labels = metrics.labels
+labels.foreach(l => printf("Precision(%s): %1.2f\n", l,
metrics.precision(l)))
+
+// Recall by label
+labels.foreach(l => printf("Recall(%s): %1.2f\n", l, metrics.recall(l)))
+
+// False positive rate by label
+labels.foreach(l => printf("FPR(%s): %1.2f\n", l,
metrics.falsePositiveRate(l)))
+
+// F-measure by label
+labels.foreach(l => printf("F1 Score(%s): %1.2f\n", l,
metrics.fMeasure(l)))
+
+// Weighted stats
+printf("Weighted precision: %1.2f\n", metrics.weightedPrecision)
+printf("Weighted recall: %1.2f\n", metrics.weightedRecall)
+printf("Weighted F1 score: %1.2f\n", metrics.weightedFMeasure)
+printf("Weighted false positive rate: %1.2f\n",
metrics.weightedFalsePositiveRate)
+
+{% endhighlight %}
+
+</div>
+
+<div data-lang="java" markdown="1">
+
+{% highlight java %}
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.MulticlassMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.mllib.linalg.Matrix;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class MulticlassClassification {
+ public static void main(String[] args) {
+ SparkConf conf = new SparkConf().setAppName("Multiclass Classification
Metrics");
+ SparkContext sc = new SparkContext(conf);
+ String path = "data/mllib/sample_multiclass_classification_data.txt";
+ JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc,
path).toJavaRDD();
+
+ // Split initial RDD into two... [60% training data, 40% testing data].
+ JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[] {0.6,
0.4}, 11L);
+ JavaRDD<LabeledPoint> training = splits[0].cache();
+ JavaRDD<LabeledPoint> test = splits[1];
+
+ // Run training algorithm to build the model.
+ final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+ .setNumClasses(3)
+ .run(training.rdd());
+
+ // Compute raw scores on the test set.
+ JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
+ new Function<LabeledPoint, Tuple2<Object, Object>>() {
+ public Tuple2<Object, Object> call(LabeledPoint p) {
+ Double prediction = model.predict(p.features());
+ return new Tuple2<Object, Object>(prediction, p.label());
+ }
+ }
+ );
+
+ // Get evaluation metrics.
+ MulticlassMetrics metrics = new
MulticlassMetrics(predictionAndLabels.rdd());
+
+ // Confusion matrix
+ Matrix confusion = metrics.confusionMatrix();
+ System.out.println("Confusion matrix: \n" + confusion);
+
+ // Overall statistics
+ System.out.println("Precision = " + metrics.precision());
+ System.out.println("Recall = " + metrics.recall());
+ System.out.println("F1 Score = " + metrics.fMeasure());
+
+ // Stats by labels
+ for (int i = 0; i < metrics.labels().length; i++) {
+ System.out.format("Class %1.2f precision = %1.2f\n",
metrics.labels()[i], metrics.precision(metrics.labels()[i]));
+ System.out.format("Class %1.2f recall = %1.2f\n",
metrics.labels()[i], metrics.recall(metrics.labels()[i]));
+ System.out.format("Class %1.2f F1 score = %1.2f\n",
metrics.labels()[i], metrics.fMeasure(metrics.labels()[i]));
+ }
+
+ //Weighted stats
+ System.out.format("Weighted precision = %1.2f\n",
metrics.weightedPrecision());
+ System.out.format("Weighted recall = %1.2f\n",
metrics.weightedRecall());
+ System.out.format("Weighted F1 score = %1.2f\n",
metrics.weightedFMeasure());
+ System.out.format("Weighted false positive rate = %1.2f\n",
metrics.weightedFalsePositiveRate());
+
+ // Save and load model
+ model.save(sc, "myModelPath");
+ LogisticRegressionModel sameModel = LogisticRegressionModel.load(sc,
"myModelPath");
+ }
+}
+
+{% endhighlight %}
+
+</div>
+
+<div data-lang="python" markdown="1">
+
+{% highlight python %}
+from pyspark.mllib.classification import LogisticRegressionWithLBFGS
+from pyspark.mllib.util import MLUtils
+from pyspark.mllib.evaluation import MulticlassMetrics
+
+# Load training data in LIBSVM format
+data = MLUtils.loadLibSVMFile(sc,
"data/mllib/sample_multiclass_classification_data.txt")
+
+# Split data into training (60%) and test (40%)
+splits = data.randomSplit([0.6, 0.4], seed = 11L)
+training = splits[0].cache()
+test = splits[1]
+
+# Run training algorithm to build the model
+model = LogisticRegressionWithLBFGS.train(training, numClasses=3)
+
+# Compute raw scores on the test set
+predictionAndLabels = test.map(lambda lp:
(float(model.predict(lp.features)), lp.label))
+
+# Instantiate metrics object
+metrics = MulticlassMetrics(predictionAndLabels)
+
+# Overall statistics
+precision = metrics.precision()
+recall = metrics.recall()
+f1Score = metrics.fMeasure()
+print "Summary Stats"
+print "Precision = %1.2f" % precision
+print "Recall = %1.2f" % recall
+print "F1 Score = %1.2f" % f1Score
+
+# Statistics by class
+labels = data.map(lambda lp: lp.label).distinct().collect()
+for label in sorted(labels):
+ print "Class %s precision = %1.2f" % (label, metrics.precision(label))
+ print "Class %s recall = %1.2f" % (label, metrics.recall(label))
+ print "Class %s F1 Measure = %1.2f" % (label, metrics.fMeasure(label,
beta=1.0))
+
+# Weighted stats
+print "Weighted recall = %1.2f" % metrics.weightedRecall
+print "Weighted precision = %1.2f" % metrics.weightedPrecision
+print "Weighted F(1) Score = %1.2f" % metrics.weightedFMeasure()
+print "Weighted F(0.5) Score = %1.2f" % metrics.weightedFMeasure(beta=0.5)
+print "Weighted false positive rate = %1.2f" %
metrics.weightedFalsePositiveRate
+{% endhighlight %}
+
+</div>
+</div>
+
+## Multilabel Classification
+
+A [multilabel
classification](https://en.wikipedia.org/wiki/Multi-label_classification)
problem involves mapping
+each sample in a dataset to a set of class labels. In this type of
classification problem, the labels are not
+mutually exclusive. For example, when classifying a set of news articles
into topics, a single article might be both
+science and politics.
+
+Here we define a set $D$ of $N$ documents
+
+$$D = \left\{d_0, d_1, ..., d_{N-1}\right\}$$
+
+Define $L_0, L_1, ..., L_{N-1}$ to be a family of label sets and $P_0,
P_1, ..., P_{N-1}$
+to be a family of prediction sets where $L_i$ and $P_i$ are the label set
and prediction set, respectively, that
+correspond to document $d_i$.
+
+The set of all unique labels is given by
+
+$$L = \bigcup_{k=0}^{N-1} L_k$$
+
+The following definition of indicator function $I_A(x)$ on a set $A$ will
be necessary
+
+$$I_A(x) = \begin{cases}1 & \text{if $x \in A$}, \\ 0 &
\text{otherwise}.\end{cases}$$
+
+<table class="table">
+ <thead>
+ <tr><th>Metric</th><th>Definition</th></tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Precision</td><td>$\frac{1}{N} \sum_{i=0}^{N-1} \frac{\left|P_i
\cap L_i\right|}{\left|P_i\right|}$</td>
+ </tr>
+ <tr>
+ <td>Recall</td><td>$\frac{1}{N} \sum_{i=0}^{N-1} \frac{\left|L_i
\cap P_i\right|}{\left|L_i\right|}$</td>
+ </tr>
+ <tr>
+ <td>Accuracy</td>
+ <td>
+ $\frac{1}{N} \sum_{i=0}^{N - 1} \frac{\left|L_i \cap P_i \right|}
+ {\left|L_i\right| + \left|P_i\right| - \left|L_i \cap P_i \right|}$
+ </td>
+ </tr>
+ <tr>
+ <td>Precision by label</td><td>$PPV(\ell)=\frac{TP}{TP + FP}=
+ \frac{\sum_{i=0}^{N-1} I_{P_i}(\ell) \cdot I_{L_i}(\ell)}
+ {\sum_{i=0}^{N-1} I_{P_i}(\ell)}$</td>
+ </tr>
+ <tr>
+ <td>Recall by label</td><td>$TPR(\ell)=\frac{TP}{P}=
+ \frac{\sum_{i=0}^{N-1} I_{P_i}(\ell) \cdot I_{L_i}(\ell)}
+ {\sum_{i=0}^{N-1} I_{L_i}(\ell)}$</td>
+ </tr>
+ <tr>
+ <td>F1-measure by label</td><td>$F1(\ell) = 2
+ \cdot \left(\frac{PPV(\ell) \cdot TPR(\ell)}
+ {PPV(\ell) + TPR(\ell)}\right)$</td>
+ </tr>
+ <tr>
+ <td>Hamming Loss</td>
+ <td>
+ $\frac{1}{N \cdot \left|L\right|} \sum_{i=0}^{N - 1}
\left|L_i\right| + \left|P_i\right| - 2\left|L_i
+ \cap P_i\right|$
+ </td>
+ </tr>
+ <tr>
+ <td>Subset Accuracy</td>
+ <td>$\frac{1}{N} \sum_{i=0}^{N-1} I_{\{L_i\}}(P_i)$</td>
+ </tr>
+ <tr>
+ <td>F1 Measure</td>
+ <td>$\frac{1}{N} \sum_{i=0}^{N-1} 2 \frac{\left|P_i \cap
L_i\right|}{\left|P_i\right| \cdot \left|L_i\right|}$</td>
+ </tr>
+ <tr>
+ <td>Micro precision</td>
+ <td>$\frac{TP}{TP + FP}=\frac{\sum_{i=0}^{N-1} \left|P_i \cap
L_i\right|}
+ {\sum_{i=0}^{N-1} \left|P_i \cap L_i\right| + \sum_{i=0}^{N-1}
\left|P_i - L_i\right|}$</td>
+ </tr>
+ <tr>
+ <td>Micro recall</td>
+ <td>$\frac{TP}{TP + FN}=\frac{\sum_{i=0}^{N-1} \left|P_i \cap
L_i\right|}
+ {\sum_{i=0}^{N-1} \left|P_i \cap L_i\right| + \sum_{i=0}^{N-1}
\left|L_i - P_i\right|}$</td>
+ </tr>
+ <tr>
+ <td>Micro F1 Measure</td>
+ <td>
+ $2 \cdot \frac{TP}{2 \cdot TP + FP + FN}=2 \cdot
\frac{\sum_{i=0}^{N-1} \left|P_i \cap L_i\right|}{2 \cdot
+ \sum_{i=0}^{N-1} \left|P_i \cap L_i\right| + \sum_{i=0}^{N-1}
\left|L_i - P_i\right| + \sum_{i=0}^{N-1}
+ \left|P_i - L_i\right|}$
+ </td>
+ </tr>
+ </tbody>
+</table>
+
+**Examples**
+
+<div class="codetabs">
+The following code snippets illustrate how to evaluate the performance of
a multilabel classifer.
+
+<div data-lang="scala" markdown="1">
+
+{% highlight scala %}
+import org.apache.spark.mllib.evaluation.MultilabelMetrics
+import org.apache.spark.rdd.RDD;
+
+/**
--- End diff --
Moved duplicated comments to the markdown text.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]