zhengruifeng commented on pull request #31693: URL: https://github.com/apache/spark/pull/31693#issuecomment-792496074
@srowen I think it is ok. @dbtsai @mengxr Could I ping you here? Since it seems that existing behavior of standardization without removing centers originated from the [previous works](https://github.com/apache/spark/pull/5967/files#diff-ed184a69391654654c5a418abe036f159a350814500780d2f50ec17fc2feb2dcR423-R460). In short, reported by @ykerzhner in [SPARK-34448](https://issues.apache.org/jira/browse/SPARK-34448), we found that existing standardization will scale nearly-constant features (whose std is small but not equals to zero) to large values. In the case in SPARK-34448, a feature named `const_feature` (its std is 0.03) will be scaled to (30.0 & 33.3). Unfortunately, it seems that underlying solvers (OWLQN/LBFGS/LBFGSB) can not handle a feature with such large(>30) values. here is @ykerzhner 's analysis on this solvers: > I took a look over the weekend. It seems good, and somewhat matches what I did in my test example where I centered before running the fitting. Unfortunately, I am not very well versed in scala, so actually reviewing the code is a bit hard. I appreciate the printouts for the test case in the PR, and I now understand why spark was returning the log(odds) for the intercept: The division of a non centered vector with a small std dev creates a vector with very large entries that looks roughly like a constant vector. When the minimizer computes the gradient, it assigns far more weight to this big vector than it does the intercept, as the magnitude appears more important than the fact that it isnt exactly constant. When the optimizer then moves in the direction of the gradient, it finds that the value of the objective function actually increased (because of the fact that this big vector isnt exactly constant), and backtracks several times. By the time it has backtracked enough t o actually get a lower value on the objective function, the movement of the intercept is nearly 0. So essentially, the intercept never moves during the entire calibration. This is also why it takes so much longer (because of all the backtracking). Once things are centered, the entries in the gradient for the intercept become dominant compared to the vector that is sort of constant, and so the minimizer begins adjusting the intercept, and moves it to the correct spot. To address this issue, I found that we can center vectors. then the LoR will converge much faster and the final solutions are much more close to the solutions of `GLMNET` (see the added suite in this PR). This issue may affect a lot, since it may also exists in mLoR/LiR/SVC/AFT, with & without intercept. So I also expect your thoughts on this, thanks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
