Andrew-Crosby commented on a change in pull request #24880: [SPARK-28062][ML]
Avoid unnecessary copy of coefficients vector in HuberAggregator
URL: https://github.com/apache/spark/pull/24880#discussion_r294984755
##########
File path:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
##########
@@ -81,6 +81,9 @@ private[ml] class HuberAggregator(
} else {
0.0
}
+ // make transient so we do not serialize between aggregation stages
+ @transient private lazy val featuresStd = bcFeaturesStd.value
Review comment:
Thanks for the feedback. I've removed the unnecessary change to featuresStd.
@srowen I tried removing the lazy modifier, but that causes both the unit
tests and my test case to fail with the following NPE. I don't understand why.
```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in
stage 3.0 failed 1 times, most recent failure: Lost task 2.0 in stage 3.0 (TID
11, localhost, executor driver): java.lang.NullPointerException
at
org.apache.spark.ml.optim.aggregator.HuberAggregator.$anonfun$add$3(HuberAggregator.scala:109)
at
org.apache.spark.ml.linalg.SparseVector.foreachActive(Vectors.scala:613)
at
org.apache.spark.ml.optim.aggregator.HuberAggregator.add(HuberAggregator.scala:107)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]