Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/7655#discussion_r35720926
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -0,0 +1,1476 @@
+---
+layout: global
+title: Evaluation Metrics - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Evaluation Metrics
+---
+
+* Table of contents
+{:toc}
+
+
+## Algorithm Metrics
+
+Spark's MLlib comes with a number of machine learning algorithms that can
be used to learn from and make predictions
+on data. When these algorithms are applied to build machine learning
models, there is a need to evaluate the performance
+of the model on some criteria, which depends on the application and its
requirements. Spark's MLlib also provides a
+suite of metrics for the purpose of evaluating the performance of machine
learning models.
+
+Specific machine learning algorithms fall under broader types of machine
learning applications like classification,
+regression, clustering, etc. Each of these types have well established
metrics for performance evaluation and those
+metrics that are currently available in Spark's MLlib are detailed in this
section.
+
+## Classification Model Evaluation
+
+While there are many different types of classification algorithms, the
evaluation of classification models all share
+similar principles. In a [supervised classification
problem](https://en.wikipedia.org/wiki/Statistical_classification),
+there exists a true output and a model-generated predicted output for each
data point. For this reason, the results for
+each data point can be assigned to one of four categories:
+
+* True Positive (TP) - class predicted by model and class in true output
--- End diff --
This is confusing to me: `class predicted by model and class in true
output`. I think the main issue is the term `true output`, which is not used in
MLlib doc before. The following is easier to understand (at least to me):
* TP: label is positive and prediction is also positive
* TN: label is negative and prediction is also negative
* FP: label is negative but prediction is positive
* FN: label is positive but prediction is negative
For multiclass problems, we can define what `positive` and `negative` means
under the context of a particular class.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]