zhengruifeng commented on pull request #31693:
URL: https://github.com/apache/spark/pull/31693#issuecomment-792496074


   @srowen I think it is ok.
   
   @dbtsai @mengxr  Could I ping you here? Since it seems that existing 
behavior of standardization without removing centers originated from the 
[previous 
works](https://github.com/apache/spark/pull/5967/files#diff-ed184a69391654654c5a418abe036f159a350814500780d2f50ec17fc2feb2dcR423-R460).
   
   In short, reported by @ykerzhner in 
[SPARK-34448](https://issues.apache.org/jira/browse/SPARK-34448), we found that 
existing standardization will scale nearly-constant features (whose std is 
small but not equals to zero) to large values. In the case in SPARK-34448, a 
feature named `const_feature` (its std is 0.03) will be scaled to (30.0 & 33.3).
   Unfortunately, it seems that underlying solvers (OWLQN/LBFGS/LBFGSB) can not 
handle a feature with such large(>30) values.
   
   
   here is @ykerzhner 's analysis on this solvers:
   
   > I took a look over the weekend.  It seems good, and somewhat matches what 
I did in my test example where I centered before running the fitting.  
Unfortunately, I am not very well versed in scala, so actually reviewing the 
code is a bit hard.  I appreciate the printouts for the test case in the PR, 
and I now understand why spark was returning the log(odds) for the intercept:  
The division of a non centered vector with a small std dev creates a vector 
with very large entries that looks roughly like a constant vector.  When the 
minimizer computes the gradient, it assigns far more weight to this big vector 
than it does the intercept, as the magnitude appears more important than the 
fact that it isnt exactly constant.  When the optimizer then moves in the 
direction of the gradient, it finds that the value of the objective function 
actually increased (because of the fact that this big vector isnt exactly 
constant), and backtracks several times.  By the time it has backtracked enough 
t
 o actually get a lower value on the objective function, the movement of the 
intercept is nearly 0.  So essentially, the intercept never moves during the 
entire calibration.  This is also why it takes so much longer (because of all 
the backtracking).  Once things are centered, the entries in the gradient for 
the intercept become dominant compared to the vector that is sort of constant, 
and so the minimizer begins adjusting the intercept, and moves it to the 
correct spot.
   
   
   To address this issue, I found that we can center vectors. then the LoR will 
converge much faster and the final solutions are much more close to the 
solutions of `GLMNET` (see the added suite in this PR). 
   This issue may affect a lot, since it may also exists in mLoR/LiR/SVC/AFT, 
with & without intercept. So I also expect your thoughts on this, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to