Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/4259#issuecomment-86731157
  
    @jkbradley I think we should only support basic regularization in spark.ml 
first which is what python scikit-learn does. If users have the need of 
different type of regularization, they can implement it based on the code we 
have. 
    
    It will be hard to implement GeneralizedLinearAlgorithm with regularization 
without using a lot of if-else statement to handle the special case. I 
implemented logistic regression, linear regression, and cox 
proportional-hazards regression with elasticnet regularization at Alpine, and 
our customers are asking for precise accuracy compared with R's glmnet package. 
As a result, I spent some time to research the original R's glmnet code, and I 
found that there is no generic way to handle different linear models. There are 
special cases here and there.
    
    For example, in logistic regression, the intercept is computed by adding 
extra one dimension in the data with constant one, but in linear regression, 
the intercept is computed by `val intercept = yMean - dot(weights, 
scaler.mean)`.
    
    As a result, I would like to implement them separately and make sure we 
have the same accuracy compared with R with proper tests first, and then we can 
abstract out the common part. I have another PR trying to do this, #1518 and I 
will continuous on that after this PR is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to