Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14326 @yanboliang I go through the code and there are several problems need to solve: The robust regression has a parameter `sigma` which must > 0, so that it is a bound optimize problem and should use LBFGS-B. But as my test, current breeze LBFGS-B has bugs and when iterating sometimes it will generate NaN value and corrupt the computing. I add some log printing to help debug, I paste a small fragment to show how the LBFGS-B corrupt: **(robust regression w/o intercept w/ regularization test)** costFun: sigma param: 1.0 huberAggrLoss + reg: 18262.68068379334 cost grad- sigma: -630.1789355384457 costFun: sigma param: 631.1789355384457 huberAggrLoss + reg: 1.256602668595641E7 cost grad- sigma: -466.0711286869664 costFun: sigma param: 64.01789355384457 huberAggrLoss + reg: 483796.45119015244 cost grad- sigma: -448.1113667824356 costFun: sigma param: 9.849995439060637 huberAggrLoss + reg: 44154.79971484518 cost grad- sigma: -275.5999029061156 costFun: sigma param: 3.2447088269560513 huberAggrLoss + reg: 8513.171279631315 cost grad- sigma: -5.737776191290681 **costFun: sigma param: NaN huberAggrLoss + reg: NaN** cost grad- sigma: -822.4999999999944 as shown above, when sigma param became NaN in iterating, the LBFGS-B has corrupted and there is no need to continue. When I trace the LBFGS-B I found that in `LBFGSB.subspaceMinimization` method, it may cause output point became (NaN, NaN...) even if the input is OK. so that I think it is a bug in `LBFGSB.subspaceMinimization` . I think this problem has no wark-around way and need Breeze community to fix it. The second problem, whether the loss should divided by N and whether L2 reg should divided by 2, I think it should keep consistent with other GLM alogrithm in mllib.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org