Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7538#discussion_r35047786
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -407,6 +449,60 @@ private[classification] class MultiClassSummarizer 
extends Serializable {
       }
     }
     
    +@Experimental
    +class LogisticRegressionTrainingSummary private[classification] (
    +    predictions: DataFrame,
    +    probabilityCol: String,
    +    labelCol: String,
    +    val objectiveHistory: Array[Double])
    +  extends LogisticRegressionSummary(predictions, probabilityCol, labelCol) 
{
    +
    +  /** Number of training iterations until termination */
    +  val totalIterations = objectiveHistory.length
    +
    +}
    +
    +@Experimental
    +class LogisticRegressionSummary private[classification] (
    +  @transient val predictions: DataFrame,
    --- End diff --
    
    Yes.
    
    Since models will be serialized to disk (model save/load) and sent to 
executors (during `model.predict` closure capture) we wanted to avoid movement 
of large and expensive data.
    
    Small/cheap summary statistics (e.g. scalars precision, recall, auroc) 
should be available in the `*Model` instance but large ones (e.g. residuals in 
LinearRegressionSummary) should only be available in the driver and not be 
serialized (i.e. transient).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to