[GitHub] [spark] Andrew-Crosby commented on a change in pull request #24880: [SPARK-28062][ML] Avoid unnecessary copy of coefficients vector in HuberAggregator

GitBox Tue, 18 Jun 2019 12:19:40 -0700

Andrew-Crosby commented on a change in pull request #24880: [SPARK-28062][ML] 
Avoid unnecessary copy of coefficients vector in HuberAggregator
URL: https://github.com/apache/spark/pull/24880#discussion_r294984755


 ##########
 File path: 
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala
 ##########
 @@ -81,6 +81,9 @@ private[ml] class HuberAggregator(
   } else {
     0.0
   }
+  // make transient so we do not serialize between aggregation stages
+  @transient private lazy val featuresStd = bcFeaturesStd.value
 
 Review comment:
   Thanks for the feedback. I've removed the unnecessary change to featuresStd.
   
   @srowen I tried removing the lazy modifier, but that causes both the unit 
tests and my test case to fail with the following NPE. I don't understand why.
   
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
stage 3.0 failed 1 times, most recent failure: Lost task 2.0 in stage 3.0 (TID 
11, localhost, executor driver): java.lang.NullPointerException
           at 
org.apache.spark.ml.optim.aggregator.HuberAggregator.$anonfun$add$3(HuberAggregator.scala:109)
           at 
org.apache.spark.ml.linalg.SparseVector.foreachActive(Vectors.scala:613)
           at 
org.apache.spark.ml.optim.aggregator.HuberAggregator.add(HuberAggregator.scala:107)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Andrew-Crosby commented on a change in pull request #24880: [SPARK-28062][ML] Avoid unnecessary copy of coefficients vector in HuberAggregator

Reply via email to