Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7099#discussion_r33967439
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -212,12 +226,110 @@ class LinearRegressionModel private[ml] (
       extends RegressionModel[Vector, LinearRegressionModel]
       with LinearRegressionParams {
     
    +  @transient private var trainingResults: 
Option[LinearRegressionTrainingResults] = None
    +
    +  /**
    +   * Gets results (e.g. residuals, mse, r^2) of model on training set. 
This method should only
    +   * be called on the driver (it is not available on workers).
    +   */
    +  def getTrainingResults: Option[LinearRegressionTrainingResults] = 
trainingResults
    +
    +  def setTrainingResults(trainingResults: 
LinearRegressionTrainingResults): this.type = {
    +    this.trainingResults = Some(trainingResults)
    +    this
    +  }
    +
    +  /**
    +   * Evaluates the model on a test-set.
    +   * @param testset Test dataset to evaluate model on.
    +   * @return
    +   */
    +  def evaluate(testset: DataFrame): LinearRegressionResults = {
    +    val t = udf { features: Vector => predict(features) }
    +    val predictionAndObservations = testset
    +      .select(col($(labelCol)), 
t(col($(featuresCol))).as($(predictionCol)))
    +
    +    new LinearRegressionResults(predictionAndObservations)
    +  }
    +
       override protected def predict(features: Vector): Double = {
         dot(features, weights) + intercept
       }
     
       override def copy(extra: ParamMap): LinearRegressionModel = {
    -    copyValues(new LinearRegressionModel(uid, weights, intercept), extra)
    +    val newModel = new LinearRegressionModel(uid, weights, intercept)
    +    if (trainingResults.isDefined) 
newModel.setTrainingResults(trainingResults.get)
    +    copyValues(newModel, extra)
    +  }
    +}
    +
    +/**
    + * :: Experimental ::
    + * Linear regression training results.
    + * @param predictionAndLabel dataframe with columns prediction (0) and 
label (1).
    + * @param objectiveTrace objective function value at each iteration.
    + */
    +@Experimental
    +class LinearRegressionTrainingResults private[ml] (
    +    predictionAndLabel: DataFrame,
    +    val objectiveTrace: Array[Double])
    +  extends LinearRegressionResults(predictionAndLabel) {
    +
    +  /** Number of training iterations until termination */
    +  val totalIterations = objectiveTrace.length
    +
    +}
    +
    +/**
    + * :: Experimental ::
    + * Linear regression results evaluated on a dataset.
    + * @param predictionAndLabel dataframe with columns prediction(0) and 
label (1).
    --- End diff --
    
    If we're providing part of the transformed data, let's provide the whole 
transformed dataset with all of the columns.  We can call it ```predictions: 
DataFrame``` and have it be the output of model.transform(data).
    
    Also: In general, it's best to refer to columns by name.  I'm not sure if 
there are strong guarantees about ordering of column names, except when 
specified via a select statement.  I.e., if you call ```df.select("a", "b")``` 
then you know the column ordering of the selected data.  Otherwise, I would not 
make assumptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to