[GitHub] spark pull request: [SPARK-9112] [ML] Implement Stats for Logistic...

feynmanliang Tue, 28 Jul 2015 10:31:46 -0700

Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7538#discussion_r35675386
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -407,6 +449,98 @@ private[classification] class MultiClassSummarizer 
extends Serializable {
       }
     }
     
    +@Experimental
    +/**
    + * :: Experimental ::
    + * Logistic regression training results.
    + * @param predictions dataframe outputted by the model's `transform` 
method.
    + * @param probabilityCol field in "predictions" which gives the calibrated 
probability of
    + *                       each sample as a vector.
    + * @param labelCol field in "predictions" which gives the true label of 
each sample.
    + * @param objectiveHistory objective function (scaled loss + 
regularization) at each iteration.
    + */
    +class LogisticRegressionTrainingSummary private[classification] (
    +    predictions: DataFrame,
    +    probabilityCol: String,
    +    labelCol: String,
    +    val objectiveHistory: Array[Double])
    +  extends LogisticRegressionSummary(predictions, probabilityCol, labelCol) 
{
    +
    +  /** Number of training iterations until termination */
    +  val totalIterations = objectiveHistory.length
    +
    +}
    +
    +@Experimental
    +/**
    + * :: Experimental ::
    + * Logistic regression results for a given model.
    + * @param predictions dataframe outputted by the model's `transform` 
method.
    + * @param probabilityCol field in "predictions" which gives the calibrated 
probability of
    + *                       each sample.
    + * @param labelCol field in "predictions" which gives the true label of 
each sample.
    + */
    +class LogisticRegressionSummary private[classification] (
    +  @transient val predictions: DataFrame,
    +  val probabilityCol: String,
    +  val labelCol: String) extends Serializable {
    +
    +  private val sqlContext = predictions.sqlContext
    +  import sqlContext.implicits._
    +
    +  /** Returns a BinaryClassificationMetrics object.
    +  */
    +  @transient private val metrics = new BinaryClassificationMetrics(
    +    predictions.select(probabilityCol, labelCol).map {
    +      case Row(score: Vector, label: Double) => (score(1), label)
    +    }
    +  )
    +
    +  /**
    +   * Returns the receiver operating characteristic (ROC) curve,
    +   * which is an Dataframe having two fields (false positive rate, true 
positive rate)
    +   * with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
    +   * Every possible probability obtained in transforming the dataset are 
used
    +   * as thresholds used in calculating the FPR and TPR.
    +   */
    +  def roc(): DataFrame = metrics.roc().toDF("FalsePositiveRate", 
"TruePositiveRate")
    +
    +  /**
    +   * Computes the area under the receiver operating characteristic (ROC) 
curve.
    +   */
    +  def areaUnderROC(): Double = metrics.areaUnderROC()
    +
    +  /**
    +   * Returns the precision-recall curve, which is an Dataframe containing
    +   * two fields (recall, precision) NOT (precision, recall), with (0.0, 
1.0) prepended to it.
    +   * Every possible probability obtained in transforming the dataset are 
used
    +   * as thresholds used in calculating the precision and recall.
    +   */
    +  def pr(): DataFrame = metrics.pr().toDF("recall", "precision")
    +
    +  /** Returns a dataframe with two fields (threshold, F-Measure) curve 
with beta = 1.0.
    +   * Every possible probability obtained in transforming the dataset are 
used
    +   * as thresholds used in calculating the F-measure.
    +   */
    +  def fMeasureByThreshold(): DataFrame = {
    +    metrics.fMeasureByThreshold().toDF("threshold", "F-Measure")
    +  }
    +
    +  /** Returns a dataframe with two fields (threshold, precision) curve.
    --- End diff --
    
    Use JavaDoc style, see [style 
guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Codedocumentationstyle).
 Ditto for others



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-9112] [ML] Implement Stats for Logistic...

Reply via email to