[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

mengxr Tue, 28 Jul 2015 18:32:33 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7655#discussion_r35720926
  
    --- Diff: docs/mllib-evaluation-metrics.md ---
    @@ -0,0 +1,1476 @@
    +---
    +layout: global
    +title: Evaluation Metrics - MLlib
    +displayTitle: <a href="mllib-guide.html">MLlib</a> - Evaluation Metrics
    +---
    +
    +* Table of contents
    +{:toc}
    +
    +
    +## Algorithm Metrics
    +
    +Spark's MLlib comes with a number of machine learning algorithms that can 
be used to learn from and make predictions
    +on data. When these algorithms are applied to build machine learning 
models, there is a need to evaluate the performance
    +of the model on some criteria, which depends on the application and its 
requirements. Spark's MLlib also provides a
    +suite of metrics for the purpose of evaluating the performance of machine 
learning models.
    +
    +Specific machine learning algorithms fall under broader types of machine 
learning applications like classification,
    +regression, clustering, etc. Each of these types have well established 
metrics for performance evaluation and those
    +metrics that are currently available in Spark's MLlib are detailed in this 
section.
    +
    +## Classification Model Evaluation
    +
    +While there are many different types of classification algorithms, the 
evaluation of classification models all share
    +similar principles. In a [supervised classification 
problem](https://en.wikipedia.org/wiki/Statistical_classification),
    +there exists a true output and a model-generated predicted output for each 
data point. For this reason, the results for
    +each data point can be assigned to one of four categories:
    +
    +* True Positive (TP) - class predicted by model and class in true output
    --- End diff --
    
    This is confusing to me: `class predicted by model and class in true 
output`. I think the main issue is the term `true output`, which is not used in 
MLlib doc before. The following is easier to understand (at least to me):
    
    * TP: label is positive and prediction is also positive
    * TN: label is negative and prediction is also negative
    * FP: label is negative but prediction is positive
    * FN: label is positive but prediction is negative
    
    For multiclass problems, we can define what `positive` and `negative` means 
under the context of a particular class.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6129][MLLIB][DOCS] Added user guide for...

Reply via email to