yuhao yang created SPARK-20602: ---------------------------------- Summary: Adding LBFGS as optimizer for LinearSVC Key: SPARK-20602 URL: https://issues.apache.org/jira/browse/SPARK-20602 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.2.0 Reporter: yuhao yang
Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between LBFGS and OWLQN on several public dataset and found LBFGS converges much faster for LinearSVC in most cases. The following table presents the number of training iterations and f1 score of both optimizers until convergence ||Dataset||LBFGS||OWLQN|| |news20.binary| 31 (0.99) | 413(0.99) | |mushroom| 28(1.0) | 170(1.0)| |madelon|143(0.75) | 8129(0.70)| |breast-cancer-scale| 15(1.0) | 16(1.0)| |phishing | 329(0.94) | 231(0.94) | |a1a(adult) | 466 (0.87) | 282 (0.87) | |a7a | 237 (0.84) | 372(0.84) | data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html training code: new LinearSVC().setMaxIter(10000).setTol(1e-6) LBFGS requires less iterations in most cases (except for a1a) and probably is a better default optimizer. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org