Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10940#issuecomment-190553302
@coderxiang @dbtsai Sorry for late response! I actually thought this PR
already got merged ... Anyway, I tested `glmnet` and found that `glmnet`
outputs zero coefficients for constant columns regardless of intercept,
regularization, and standardization settings. I thought about it today and I
feel it actually makes sense. If we have a constant column in our training
data, do we expect it to change or stay constant in test data? If its value
might change, we should set its coefficient to zero because we cannot estimate
how big the change would be. If its value stays constant (or maybe users
created this column to add bias manually), it shouldn't be regularized and
users should really turn on `fitIntercept` instead. So my suggestion is to
follow glmnet and set the coefficients of constant columns to zero regardless
of other settings. If there are constant columns and `fitIntercept` is false.
We should output a warning message. Does it sound good to you?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]