[GitHub] [spark] zhengruifeng commented on pull request #31693: [SPARK-34448][ML] Binary logistic regression incorrectly computes the intercept and coefficients with small var features

GitBox Sun, 07 Mar 2021 22:12:13 -0800


zhengruifeng commented on pull request #31693:
URL: https://github.com/apache/spark/pull/31693#issuecomment-792496074

@srowen I think it is ok.

@dbtsai @mengxr Could I ping you here? Since it seems that existing
behavior of standardization without removing centers originated from the
[previous
works](https://github.com/apache/spark/pull/5967/files#diff-ed184a69391654654c5a418abe036f159a350814500780d2f50ec17fc2feb2dcR423-R460).

In short, reported by @ykerzhner in
[SPARK-34448](https://issues.apache.org/jira/browse/SPARK-34448), we found that
existing standardization will scale nearly-constant features (whose std is
small but not equals to zero) to large values. In the case in SPARK-34448, a
feature named `const_feature` (its std is 0.03) will be scaled to (30.0 & 33.3).
Unfortunately, it seems that underlying solvers (OWLQN/LBFGS/LBFGSB) can not
handle a feature with such large(>30) values.

here is @ykerzhner 's analysis on this solvers:

> I took a look over the weekend. It seems good, and somewhat matches what
I did in my test example where I centered before running the fitting.
Unfortunately, I am not very well versed in scala, so actually reviewing the
code is a bit hard. I appreciate the printouts for the test case in the PR,
and I now understand why spark was returning the log(odds) for the intercept:
The division of a non centered vector with a small std dev creates a vector
with very large entries that looks roughly like a constant vector. When the
minimizer computes the gradient, it assigns far more weight to this big vector
than it does the intercept, as the magnitude appears more important than the
fact that it isnt exactly constant. When the optimizer then moves in the
direction of the gradient, it finds that the value of the objective function
actually increased (because of the fact that this big vector isnt exactly
constant), and backtracks several times. By the time it has backtracked enough
t
o actually get a lower value on the objective function, the movement of the
intercept is nearly 0. So essentially, the intercept never moves during the
entire calibration. This is also why it takes so much longer (because of all
the backtracking). Once things are centered, the entries in the gradient for
the intercept become dominant compared to the vector that is sort of constant,
and so the minimizer begins adjusting the intercept, and moves it to the
correct spot.

To address this issue, I found that we can center vectors. then the LoR will
converge much faster and the final solutions are much more close to the
solutions of `GLMNET` (see the added suite in this PR).
This issue may affect a lot, since it may also exists in mLoR/LiR/SVC/AFT,
with & without intercept. So I also expect your thoughts on this, thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #31693: [SPARK-34448][ML] Binary logistic regression incorrectly computes the intercept and coefficients with small var features

Reply via email to