Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Tested with several larger data set with Hinge Loss function, to compare
l-bfgs and owlqn solvers.
Run until converged or exceed maxIter (2000).
dataset | numRecords | numFeatures | l-bfgs iterations | owlqn iterations |
l-bfgs final loss | owlqn final loss
-------- |
---------------|---------------|---------------|---------------|---------------|---------------
url_combined | 2396130 | 3231961 | 317 (952 sec) | 287 (1661 sec) |
9.71E-5| 1.64E-4
kdda | 8407752 | 20216830 | 2000+ (29729 sec) | 288 13664 (sec) | 0.0068 |
0.0135
webspam | 350000 | 254 | 344 (67 sec) | 1502 (714 sec) | 0.18273 | 0.18273
SUSY | 5000000 | 18 | 152 (145 sec) | 1242 (3357 sec) | 0.499 | 0.499
l-bfgs does not always take fewer iterations, but it converges to a smaller
final loss.
For each iteration, owlqn takes longer time ( 2 or 3 times) than l-bfgs.
Logistic Regression also exhibits the similar behavior.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]