Hello everyone, i need a SVM classifier with class probability estimates. Now, it is very difficult for me to decide between Scikit and Weka, since I do not really understand the differences in the implementations.
The implementation from scikit is based on [1]. From the Scikit documentation: "In the binary case, the probabilities are calibrated using Platt scaling: logistic regression on the SVM’s scores, fit by an additional cross-validation on the training data. In the multiclass case, this is extended as per Wu et al. (2004) [1]. [...] Platt’s method is also known to have theoretical issues." (http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities) Weka refers to [2],[3] and [4]. From the Weka documentation: "Implements John Platt's sequential minimal optimization algorithm for training a support vector classifier. [...] Multi-class problems are solved using pairwise classification (1-vs-1 and if logistic models are built pairwise coupling according to Hastie and Tibshirani, 1998) [4]. To obtain proper probability estimates, use the option that fits logistic regression models to the outputs of the support vector machine. In the multi-class case the predicted probabilities are coupled using Hastie and Tibshirani's pairwise coupling method." (http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html) Both seem to base on Platt's idea (logistic regression on SVM outputs/scores), but there is an improvement of Platt's algorithm implemented in Weka [3], while Wu et al. [1] use their own improvement of Platt's algorithm [4] (which I don't know if it is implemented in Scikit, too). Regarding multi class classification, Scikit implements the approach from Wu et al. [1], where they claim that for the multiclass problem, their approach is more stable than the method from Hastie and Tibshirani (which is used by Weka [4]). Before I have to read all the papers in detail, maybe there is someone familiar with svm probabilities. Is it possible to make a statement about which of the implementations is "better" - or does it depend on the problem it is applied to? REFERENCES: ----------- [1] Wu, Lin and Weng, "Probability estimates for multi-class classification by pairwise coupling". JMLR 5:975-1005, 2004. [2] J. Platt: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In B. Schoelkopf and C. Burges and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, 1998. [3] S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, K.R.K. Murthy (2001). Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation. 13(3):637-649. [4] Trevor Hastie, Robert Tibshirani: Classification by Pairwise Coupling. In: Advances in Neural Information Processing Systems, 1998. [5] Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. A note on Platt’s probabilistic outputs for support vector machines. Technical report, Department of Computer Science, National Taiwan University, 2003. Kind regards Richard ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general