Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11247#issuecomment-185826603
@yanboliang In #7080, It was intentionally made that `standardization =
false` will run the same route as `standardization = true` without
regularization. Mathematically, this can be proven that they will converge into
the same solution.
With your change, it seems that when `standardization = false`, the
features are not standardized at all, and this will result convergency issue
when the features have very different scales. As a result, in order to avoid
this issue, the features will be always standardized no matter what, and then
each components will be penalized different to correct this effect.
I wonder if your test with less iterations are special case, since without
standardization, some of the data will not be converged when I tested it. This
may help some dataset, but I don't know if this will break someone's working
training.
BTW, can you submit the bug fix for `LogisticRegression.scala` in a
separate PR?
```scala
optimizer.getUpdater() match {
case x: SquaredL2Updater => runWithMlLogisitcRegression(0.0)
case x: L1Updater => runWithMlLogisitcRegression(1.0)
```
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]