Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/14519#discussion_r73922053
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
---
@@ -478,21 +482,23 @@ object AFTSurvivalRegressionModel extends
MLReadable[AFTSurvivalRegressionModel]
* $$
* </blockquote></p>
*
- * @param parameters including three part: The log of scale parameter, the
intercept and
- * regression coefficients corresponding to the features.
+ * @param bcParameters The broadcasted value includes three part: The log
of scale parameter,
+ * the intercept and regression coefficients
corresponding to the features.
* @param fitIntercept Whether to fit an intercept term.
- * @param featuresStd The standard deviation values of the features.
+ * @param bcFeaturesStd The broadcast standard deviation values of the
features.
*/
private class AFTAggregator(
- parameters: BDV[Double],
+ bcParameters: Broadcast[BDV[Double]],
fitIntercept: Boolean,
- featuresStd: Array[Double]) extends Serializable {
+ bcFeaturesStd: Broadcast[Array[Double]]) extends Serializable {
+ // make transient so we do not serialize between aggregation stages
+ @transient private lazy val parameters = bcParameters.value
// the regression coefficients to the covariates
- private val coefficients = parameters.slice(2, parameters.length)
- private val intercept = parameters(1)
+ @transient private lazy val coefficients = parameters.slice(2,
parameters.length)
+ @transient private lazy val intercept = parameters(1)
// sigma is the scale parameter of the AFT model
- private val sigma = math.exp(parameters(0))
+ @transient private lazy val sigma = math.exp(parameters(0))
--- End diff --
You are right. In scala, when we use `@transient private val`, that `lazy
evaluation` will be only evaluated once even after
serialization/deserialization cycle. As a result, after the `AFTAggregator`
is broadcasted into executors, the variable will be be evaluated again, and
will be default to null.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]