[GitHub] spark pull request: [SPARK-9112] [ML] Implement Stats for Logistic...

jkbradley Mon, 03 Aug 2015 10:45:37 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7538#discussion_r36110231
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -407,6 +449,103 @@ private[classification] class MultiClassSummarizer 
extends Serializable {
       }
     }
     
    +@Experimental
    +/**
    + * :: Experimental ::
    + * Logistic regression training results.
    + * @param predictions dataframe outputted by the model's `transform` 
method.
    + * @param probabilityCol field in "predictions" which gives the calibrated 
probability of
    + *                       each sample as a vector.
    + * @param labelCol field in "predictions" which gives the true label of 
each sample.
    + * @param objectiveHistory objective function (scaled loss + 
regularization) at each iteration.
    + */
    +class LogisticRegressionTrainingSummary private[classification] (
    +    predictions: DataFrame,
    +    probabilityCol: String,
    +    labelCol: String,
    +    val objectiveHistory: Array[Double])
    +  extends LogisticRegressionSummary(predictions, probabilityCol, labelCol) 
{
    +
    +  /** Number of training iterations until termination */
    +  val totalIterations = objectiveHistory.length
    +
    +}
    +
    +@Experimental
    +/**
    + * :: Experimental ::
    + * Logistic regression results for a given model.
    + * @param predictions dataframe outputted by the model's `transform` 
method.
    + * @param probabilityCol field in "predictions" which gives the calibrated 
probability of
    + *                       each sample.
    + * @param labelCol field in "predictions" which gives the true label of 
each sample.
    + */
    +class LogisticRegressionSummary private[classification] (
    +  @transient val predictions: DataFrame,
    +  val probabilityCol: String,
    +  val labelCol: String) extends Serializable {
    +
    +  private val sqlContext = predictions.sqlContext
    +  import sqlContext.implicits._
    +
    +  /** Returns a BinaryClassificationMetrics object.
    +  */
    +  // TODO: Allow the user to vary the number of bins using a setBins 
method in
    +  // BinaryClassificationMetrics. For now the default is set to 100.
    +  @transient private val metrics = new BinaryClassificationMetrics(
    +    predictions.select(probabilityCol, labelCol).map {
    +      case Row(score: Vector, label: Double) => (score(1), label)
    +    }, 100
    +  )
    +
    +  /**
    +   * Returns the receiver operating characteristic (ROC) curve,
    +   * which is an Dataframe having two fields (false positive rate, true 
positive rate)
    +   * with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
    +   * Every possible probability obtained in transforming the dataset are 
used
    +   * as thresholds used in calculating the FPR and TPR.
    +   */
    +  def roc(): DataFrame = metrics.roc().toDF("FalsePositiveRate", 
"TruePositiveRate")
    --- End diff --
    
    How about using names "FPR" and "TPR" instead?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-9112] [ML] Implement Stats for Logistic...

Reply via email to