[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

sethah Mon, 08 Aug 2016 20:34:05 -0700

Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/14520
  
    The change here does not really affect serialization. Spark automatically 
broadcasts the coefficients each time calculate is called before, and marking 
it as a broadcast variable explicitly won't likely have much of a performance 
effect (based on my own testing and the description 
[here](http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables)).
 What we need to do here is to change the structure of the aggregator to match 
up with the fix for `LeastSquaresAggregator` by passing the `featuresStd` and 
`coefficients` as constructor args, but marking them as `@transient lazy val`. 
    
    I'm in favor of explicitly broadcasting the coefficients too, as was done 
in `LeastSquaresAggregator`, but we should explicitly destroy them as well. 
Thanks for working on this!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

Reply via email to