On 05/17/2014 02:50 AM, Robert McGibbon wrote:
> Have you tried comparing the results given by the two packages to one
> another on an example dataset?
>
> -Robert

No, because it won't show my anything about a qualitative difference in
the implementations. I could take a simple 2D example as I did with
Scikit (http://imgur.com/txhgLTE) and I'm sure, it would perform "pretty
nice" with Weka, too. My real problem is high-dimensional, there's much
feature-engineering to do and I can't compare Scikit and Weka on it easily.

Isn't scikit-learn (regarding the core SVM algorithms) just a wrapper of
libsvm? Because in that case, the documentation of libsvm
(http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf) gives me some
additional information, libsvm itself uses Wu et al. (2004) and an
improvement of Platt's algorithm (Lin et al. (2003/2007)) (it seems the 
same what Scikit says to use for the multiclass case).

One of the main developers of Weka answered, that he ist not familiar
with [1] and [5] and it seems Weka is using Platt's original algorithm
as well as the method from Hastie and Tibshirani, which both have
already been improved by Lin/Wu. If scikit simply wraps libsvm, it
should be my choice.

Kind regards

Richard

> On Fri, May 16, 2014 at 8:03 AM, Richard Cubek
> <richard.cu...@hs-weingarten.de> wrote:
>
>> Hello everyone,
>>
>> i need a SVM classifier with class probability estimates. Now, it
>> is very difficult for me to decide between Scikit and Weka, since I
>> do not really understand the differences in the implementations.
>>
>> The implementation from scikit is based on [1].
>>
>> From the Scikit documentation: "In the binary case, the
>> probabilities are calibrated using Platt scaling: logistic
>> regression on the SVM’s scores, fit by an additional
>> cross-validation on the training data. In the multiclass case, this
>> is extended as per Wu et al. (2004) [1]. [...] Platt’s method is
>> also known to have theoretical issues."
>>
> (http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities
>>
>
>>
>> Weka refers to [2],[3] and [4].
>>
>> From the Weka documentation: "Implements John Platt's sequential
>> minimal optimization algorithm for training a support vector
>> classifier. [...] Multi-class problems are solved using pairwise
>> classification (1-vs-1 and if logistic models are built pairwise
>> coupling according to Hastie and Tibshirani, 1998) [4]. To obtain
>> proper probability estimates, use the option that fits logistic
>> regression models to the outputs of the support vector machine. In
>> the multi-class case the predicted probabilities are coupled using
>> Hastie and Tibshirani's pairwise coupling method."
>>
> (http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html
>>
>
>>
>> Both seem to base on Platt's idea (logistic regression on SVM
>> outputs/scores), but there is an improvement of Platt's algorithm
>> implemented in Weka [3], while Wu et al. [1] use their own
>> improvement of Platt's algorithm [5] (which I don't know if it is
>> implemented in Scikit, too).
>>
>> Regarding multi class classification, Scikit implements the
>> approach from Wu et al. [1], where they claim that for the
>> multiclass problem, their approach is more stable than the method
>> from Hastie and Tibshirani (which is used by Weka [4]).
>>
>> Before I have to read all the papers in detail, maybe there is
>> someone familiar with svm probabilities. Is it possible to make a
>> statement about which of the implementations is "better" - or does
>> it depend on the problem it is applied to?
>>
>> REFERENCES: ----------- [1] Wu, Lin and Weng, "Probability
>> estimates for multi-class classification by pairwise coupling".
>> JMLR 5:975-1005, 2004. [2] J. Platt: Fast Training of Support
>> Vector Machines using Sequential Minimal Optimization. In B.
>> Schoelkopf and C. Burges and A. Smola, editors, Advances in Kernel
>> Methods - Support Vector Learning, 1998. [3] S.S. Keerthi, S.K.
>> Shevade, C. Bhattacharyya, K.R.K. Murthy (2001). Improvements to
>> Platt's SMO Algorithm for SVM Classifier Design. Neural
>> Computation. 13(3):637-649. [4] Trevor Hastie, Robert Tibshirani:
>> Classification by Pairwise Coupling. In: Advances in Neural
>> Information Processing Systems, 1998. [5] Hsuan-Tien Lin, Chih-Jen
>> Lin, and Ruby C. Weng. A note on Platt’s probabilistic outputs for
>> support vector machines. Technical report, Department of Computer
>> Science, National Taiwan University, 2003.
>>
>> Kind regards
>>
>> Richard
>>
>>
> ------------------------------------------------------------------------------
>>
>
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For
>> FREE Instantly run your Selenium tests across 300+ browser/OS
>> combos. Get unparalleled scalability from the best Selenium
>> testing platform available Simple to use. Nothing to install. Get
>> started now for free." http://p.sf.net/sfu/SauceLabs [3]
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> [4]
>
>
>
> Links: ------ [1]
> http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities
>
>
[2] http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html
> [3] http://p.sf.net/sfu/SauceLabs [4]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
>
>
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos. Get
> unparalleled scalability from the best Selenium testing platform
> available Simple to use. Nothing to install. Get started now for
> free." http://p.sf.net/sfu/SauceLabs
>
> _______________________________________________ Scikit-learn-general
> mailing list Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to