[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

sethah Fri, 17 Mar 2017 12:29:02 -0700

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15435#discussion_r106727670
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -1086,83 +1115,124 @@ private[classification] class MultiClassSummarizer 
extends Serializable {
     }
     
     /**
    - * Abstraction for multinomial Logistic Regression Training results.
    - * Currently, the training summary ignores the training weights except
    - * for the objective trace.
    - */
    -sealed trait LogisticRegressionTrainingSummary extends 
LogisticRegressionSummary {
    -
    -  /** objective function (scaled loss + regularization) at each iteration. 
*/
    -  def objectiveHistory: Array[Double]
    -
    -  /** Number of training iterations until termination */
    -  def totalIterations: Int = objectiveHistory.length
    -
    -}
    -
    -/**
      * Abstraction for Logistic Regression Results for a given model.
      */
     sealed trait LogisticRegressionSummary extends Serializable {
     
       /**
        * Dataframe output by the model's `transform` method.
        */
    +  @Since("2.2.0")
       def predictions: DataFrame
     
       /** Field in "predictions" which gives the probability of each class as 
a vector. */
    +  @Since("2.2.0")
       def probabilityCol: String
     
    +  /** Field in "predictions" which gives the prediction of each class as a 
vector. */
    +  @Since("2.2.0")
    +  def predictionCol: String
    +
       /** Field in "predictions" which gives the true label of each instance 
(if available). */
    +  @Since("2.2.0")
       def labelCol: String
     
       /** Field in "predictions" which gives the features of each instance as 
a vector. */
    +  @Since("2.2.0")
       def featuresCol: String
     
    +  @transient private val multiclassMetrics = {
    +    new MulticlassMetrics(
    +      predictions.select(
    +        col(predictionCol),
    +        col(labelCol).cast(DoubleType))
    +        .rdd.map { case Row(prediction: Double, label: Double) => 
(prediction, label) })
    +  }
    +
    +  /** Returns true positive rate for each label. */
    +  @Since("2.2.0")
    +  def truePositiveRateByLabel: Array[Double] = recallByLabel
    +
    +  /** Returns false positive rate for each label. */
    +  @Since("2.2.0")
    +  def falsePositiveRateByLabel: Array[Double] = {
    +    multiclassMetrics.labels.map(label => 
multiclassMetrics.falsePositiveRate(label))
    +  }
    +
    +  /** Returns precision for each label. */
    +  @Since("2.2.0")
    +  def precisionByLabel: Array[Double] = {
    +    multiclassMetrics.labels.map(label => 
multiclassMetrics.precision(label))
    +  }
    +
    +  /** Returns recall for each label. */
    +  @Since("2.2.0")
    +  def recallByLabel: Array[Double] = {
    +    multiclassMetrics.labels.map(label => multiclassMetrics.recall(label))
    +  }
    +
    +  /**
    +   * Returns f-measure for each label.
    +   */
    +  @Since("2.2.0")
    +  def fMeasureByLabel(beta: Double): Array[Double] = {
    +    multiclassMetrics.labels.map(label => 
multiclassMetrics.fMeasure(label, beta))
    +  }
    +
    +  /** Returns f1-measure for each label. */
    +  @Since("2.2.0")
    +  def fMeasureByLabel: Array[Double] = fMeasureByLabel(1.0)
    +
    +  /** Returns accuracy. */
    +  @Since("2.2.0")
    +  def accuracy: Double = multiclassMetrics.accuracy
    +
    +  /** Returns weighted true positive rate. */
    +  @Since("2.2.0")
    +  def weightedTruePositiveRate: Double = weightedRecall
    +
    +  /** Returns weighted false positive rate. */
    +  @Since("2.2.0")
    +  def weightedFalsePositiveRate: Double = 
multiclassMetrics.weightedFalsePositiveRate
    +
    +  /** Returns weighted averaged recall. */
    +  @Since("2.2.0")
    +  def weightedRecall: Double = multiclassMetrics.weightedRecall
    +
    +  /** Returns weighted averaged precision. */
    +  @Since("2.2.0")
    +  def weightedPrecision: Double = multiclassMetrics.weightedPrecision
    +
    +  /**
    +   * Returns weighted averaged f-measure.
    +   */
    +  @Since("2.2.0")
    +  def weightedFMeasure(beta: Double): Double = 
multiclassMetrics.weightedFMeasure(beta)
    +
    +  /** Returns weighted averaged f1-measure. */
    +  @Since("2.2.0")
    +  def weightedFMeasure: Double = multiclassMetrics.weightedFMeasure(1.0)
     }
     
     /**
    - * :: Experimental ::
    - * Logistic regression training results.
    - *
    - * @param predictions dataframe output by the model's `transform` method.
    - * @param probabilityCol field in "predictions" which gives the 
probability of
    - *                       each class as a vector.
    - * @param labelCol field in "predictions" which gives the true label of 
each instance.
    - * @param featuresCol field in "predictions" which gives the features of 
each instance as a vector.
    - * @param objectiveHistory objective function (scaled loss + 
regularization) at each iteration.
    + * Abstraction for multinomial Logistic Regression Training results.
    --- End diff --
    
    don't use capital letters here and elsewhere, only at the beginning of a 
sentence.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

Reply via email to