Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
On many large dataset, LinearSVC cannot get the similar result with
sklearn. e.g., SKLearn may get coefficients (5, 10, 15, 20), and spark
LinearSVC will get (10, 20, 30, 40). It's different but in most cases it's
proportional scaling. This may partly because the different handling of
intercept scaling in sklearn and spark. But on larger dataset, I found LBFGS
and OWLQN also presents the similar issue.
I wonder if it's normal to get different (but proportional) coefficients
from different optimization method for the same dataset and parameter settings.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]