Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11610#issuecomment-196486264
  
    @iyounus @dbtsai The normal equation approach will fail if the matrix A is 
rank-deficient. It happens when there are constant columns. However, more 
generally, it happens when there are linearly dependent columns in the training 
dataset. So this PR solves one case but it is not a general solution. We can 
try two approaches:
    1. Distributed QR/SVD to deal with rank deficiency. QR requires pivoting. 
SVD might be an easier approach for us, though a little expensive.
    2. Add a small positive number, e.g., `1e-8`, to the diagonal when AtA is 
rank deficient. This should solve the numerical issue but returns a less 
accurate solution. However, it is always good to apply regularization. I 
believe this won't make the model worse.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to