Hello everyone,

i need a SVM classifier with class probability estimates. Now, it is 
very difficult for me to decide between Scikit and Weka, since I do not 
really understand the differences in the implementations.

The implementation from scikit is based on [1].

 From the Scikit documentation: "In the binary case, the probabilities 
are calibrated using Platt scaling: logistic regression on the SVM’s 
scores, fit by an additional cross-validation on the training data. In 
the multiclass case, this is extended as per Wu et al. (2004) [1]. [...]
Platt’s method is also known to have theoretical issues."
(http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities)

Weka refers to [2],[3] and [4].

 From the Weka documentation: "Implements John Platt's sequential 
minimal optimization algorithm for training a support vector classifier.
[...] Multi-class problems are solved using pairwise classification 
(1-vs-1 and if logistic models are built pairwise coupling according to 
Hastie and Tibshirani, 1998) [4].
To obtain proper probability estimates, use the option that fits 
logistic regression models to the outputs of the support vector machine. 
In the multi-class case the predicted probabilities are coupled using 
Hastie and Tibshirani's pairwise coupling method."
(http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html)

Both seem to base on Platt's idea (logistic regression on SVM 
outputs/scores), but there is an improvement of Platt's algorithm 
implemented in Weka [3], while Wu et al. [1] use their own improvement 
of Platt's algorithm [4] (which I don't know if it is implemented in 
Scikit, too).

Regarding multi class classification, Scikit implements the approach 
from Wu et al. [1], where they claim that for the multiclass problem, 
their approach is more stable than the method from Hastie and Tibshirani 
(which is used by Weka [4]).

Before I have to read all the papers in detail, maybe there is someone 
familiar with svm probabilities. Is it possible to make a statement 
about which of the implementations is "better" - or does it depend on 
the problem it is applied to?


REFERENCES:
-----------
[1] Wu, Lin and Weng, "Probability estimates for multi-class 
classification by pairwise coupling". JMLR 5:975-1005, 2004.
[2] J. Platt: Fast Training of Support Vector Machines using Sequential 
Minimal Optimization. In B. Schoelkopf and C. Burges and A. Smola, 
editors, Advances in Kernel Methods - Support Vector Learning, 1998.
[3] S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, K.R.K. Murthy (2001). 
Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural 
Computation. 13(3):637-649.
[4] Trevor Hastie, Robert Tibshirani: Classification by Pairwise 
Coupling. In: Advances in Neural Information Processing Systems, 1998.
[5] Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. A note on Platt’s 
probabilistic outputs for support vector machines. Technical report, 
Department of Computer Science, National Taiwan University, 2003.


Kind regards

Richard

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to