Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/15593#discussion_r86497058
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1486,57 +1489,75 @@ private class LogisticAggregator(
var marginOfLabel = 0.0
var maxMargin = Double.NegativeInfinity
- val margins = Array.tabulate(numClasses) { i =>
- var margin = 0.0
- features.foreachActive { (index, value) =>
- if (localFeaturesStd(index) != 0.0 && value != 0.0) {
- margin += localCoefficients(i * numFeaturesPlusIntercept +
index) *
- value / localFeaturesStd(index)
- }
+ val margins = new Array[Double](numClasses)
--- End diff --
I think a slightly more detailed comment would be good for the aggregator,
something like the following (please feel free to make it more clear):
```
In order to avoid unnecessary computation during calculation of the
gradient updates,
we lay out the coefficients in column major order during training. This
allows us to
perform feature standardization once, while still retaining sequential
memory access
for speed. We convert back to row-major order when we create the model,
since this form is optimal for the matrix operations used for prediction.
```
Here we can amend slightly to say:
```
Additionally, since the coefficients were laid out in column major order
during training to
avoid extra computation, we convert them back to row major before passing
them to the model.
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]