Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177385145
Commenting on your issues.
Issue 1:
With `WeightedLeastSquares`, we have option to standardize the label and
features separately. As a result, if the label is not standardized, even `yStd
== 0`, the problem can be solved.
As a result, in your case 4, when label is not standardized, and the
features are standardized, this is not defined, so the users should get the
result.
For case 3, can you elaborate why analytical solution exists even the label
is standardized?
Issue 2:
In my opinion, even case 1, and case 2 are ill-defined since in GLMNET, the
label is standardized by default, and GLMNET will not return any result at all.
It just happens that without regularization, with/without standardization on
labels will not change the solution, so we just treat them as if we don't
standardize the label. This can explain your case 3.
Issue 3:
I think this is because your normal equation solver doesn't standardize the
label, so the discrepancies occur.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]