Have you tried comparing the results given by the two packages to one
another on an example dataset?

-Robert


On Fri, May 16, 2014 at 8:03 AM, Richard Cubek <
richard.cu...@hs-weingarten.de> wrote:

> Hello everyone,
>
> i need a SVM classifier with class probability estimates. Now, it is
> very difficult for me to decide between Scikit and Weka, since I do not
> really understand the differences in the implementations.
>
> The implementation from scikit is based on [1].
>
>  From the Scikit documentation: "In the binary case, the probabilities
> are calibrated using Platt scaling: logistic regression on the SVM’s
> scores, fit by an additional cross-validation on the training data. In
> the multiclass case, this is extended as per Wu et al. (2004) [1]. [...]
> Platt’s method is also known to have theoretical issues."
> (http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities)
>
> Weka refers to [2],[3] and [4].
>
>  From the Weka documentation: "Implements John Platt's sequential
> minimal optimization algorithm for training a support vector classifier.
> [...] Multi-class problems are solved using pairwise classification
> (1-vs-1 and if logistic models are built pairwise coupling according to
> Hastie and Tibshirani, 1998) [4].
> To obtain proper probability estimates, use the option that fits
> logistic regression models to the outputs of the support vector machine.
> In the multi-class case the predicted probabilities are coupled using
> Hastie and Tibshirani's pairwise coupling method."
> (http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html)
>
> Both seem to base on Platt's idea (logistic regression on SVM
> outputs/scores), but there is an improvement of Platt's algorithm
> implemented in Weka [3], while Wu et al. [1] use their own improvement
> of Platt's algorithm [4] (which I don't know if it is implemented in
> Scikit, too).
>
> Regarding multi class classification, Scikit implements the approach
> from Wu et al. [1], where they claim that for the multiclass problem,
> their approach is more stable than the method from Hastie and Tibshirani
> (which is used by Weka [4]).
>
> Before I have to read all the papers in detail, maybe there is someone
> familiar with svm probabilities. Is it possible to make a statement
> about which of the implementations is "better" - or does it depend on
> the problem it is applied to?
>
>
> REFERENCES:
> -----------
> [1] Wu, Lin and Weng, "Probability estimates for multi-class
> classification by pairwise coupling". JMLR 5:975-1005, 2004.
> [2] J. Platt: Fast Training of Support Vector Machines using Sequential
> Minimal Optimization. In B. Schoelkopf and C. Burges and A. Smola,
> editors, Advances in Kernel Methods - Support Vector Learning, 1998.
> [3] S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, K.R.K. Murthy (2001).
> Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural
> Computation. 13(3):637-649.
> [4] Trevor Hastie, Robert Tibshirani: Classification by Pairwise
> Coupling. In: Advances in Neural Information Processing Systems, 1998.
> [5] Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. A note on Platt’s
> probabilistic outputs for support vector machines. Technical report,
> Department of Computer Science, National Taiwan University, 2003.
>
>
> Kind regards
>
> Richard
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to