Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/11247#issuecomment-188428112
@yanboliang I share the same concern with you. However, user may have
`standardization = false`, but still want to have a good convergency when the
scales are quite different. For example, you can test it out that scaling one
column by 100x, the corresponding coefficient should be shrunk by 100x. This
can not be achieved without this trick. In R's GLMNET, they do the same trick
to ensure this property. Although it may be confusing for users, since it's
transparent to users, I think it's still okay.
As for your second point, in `LogisticRegressionWithLBFGS`, when
`standardization = false` and `regParam = 0.0`, the solution will be identical
to `standardization = true` and `regParam = 0.0`, so the users still can get
the correct answer. The breaking change in `LogisticRegressionWithLBFGS` is
mainly solving the issue of regularizing the intercept, and
https://github.com/apache/spark/pull/10788/files#diff-c78e117e05337bd8f7151ddf9450047dL402
is just the side effect that we handle standardization better so the
convergency is better for the problem with different scale when
`standardization = false`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]