Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10639#issuecomment-173327510
@yanboliang The proposal looks okay to me. Some minor comments:
1. `WeightedLeastSquares` model contains extra content that we don't need
in IRLS iterations, e.g., `diagInvAWA`. This could be a follow-up work because
the API is currently private.
2. `reweightedFunction` -> `reweightFunc`. Does it need to operate on RDDs?
We can simplify the API to `(Instance, Model) => (Double, Double)`.
3. `standardizeFeatures` and `standardizeLabel` only affect the
regularization term. We don't need it to match `glm` in R. So let's remove them
from the API. Turning them on would change the convergence of IRLS algorithm,
which needs further discussion.
QR is more stable than the normal equation solver for WLS. We should switch
to it in the future. For general Lp regression, there is an interior-point
method implemented in R's `quantreg` package.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]