On 05/17/2014 02:50 AM, Robert McGibbon wrote: > Have you tried comparing the results given by the two packages to one > another on an example dataset? > > -Robert
No, because it won't show my anything about a qualitative difference in the implementations. I could take a simple 2D example as I did with Scikit (http://imgur.com/txhgLTE) and I'm sure, it would perform "pretty nice" with Weka, too. My real problem is high-dimensional, there's much feature-engineering to do and I can't compare Scikit and Weka on it easily. Isn't scikit-learn (regarding the core SVM algorithms) just a wrapper of libsvm? Because in that case, the documentation of libsvm (http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf) gives me some additional information, libsvm itself uses Wu et al. (2004) and an improvement of Platt's algorithm (Lin et al. (2003/2007)) (it seems the same what Scikit says to use for the multiclass case). One of the main developers of Weka answered, that he ist not familiar with [1] and [5] and it seems Weka is using Platt's original algorithm as well as the method from Hastie and Tibshirani, which both have already been improved by Lin/Wu. If scikit simply wraps libsvm, it should be my choice. Kind regards Richard > On Fri, May 16, 2014 at 8:03 AM, Richard Cubek > <richard.cu...@hs-weingarten.de> wrote: > >> Hello everyone, >> >> i need a SVM classifier with class probability estimates. Now, it >> is very difficult for me to decide between Scikit and Weka, since I >> do not really understand the differences in the implementations. >> >> The implementation from scikit is based on [1]. >> >> From the Scikit documentation: "In the binary case, the >> probabilities are calibrated using Platt scaling: logistic >> regression on the SVM’s scores, fit by an additional >> cross-validation on the training data. In the multiclass case, this >> is extended as per Wu et al. (2004) [1]. [...] Platt’s method is >> also known to have theoretical issues." >> > (http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities >> > >> >> Weka refers to [2],[3] and [4]. >> >> From the Weka documentation: "Implements John Platt's sequential >> minimal optimization algorithm for training a support vector >> classifier. [...] Multi-class problems are solved using pairwise >> classification (1-vs-1 and if logistic models are built pairwise >> coupling according to Hastie and Tibshirani, 1998) [4]. To obtain >> proper probability estimates, use the option that fits logistic >> regression models to the outputs of the support vector machine. In >> the multi-class case the predicted probabilities are coupled using >> Hastie and Tibshirani's pairwise coupling method." >> > (http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html >> > >> >> Both seem to base on Platt's idea (logistic regression on SVM >> outputs/scores), but there is an improvement of Platt's algorithm >> implemented in Weka [3], while Wu et al. [1] use their own >> improvement of Platt's algorithm [5] (which I don't know if it is >> implemented in Scikit, too). >> >> Regarding multi class classification, Scikit implements the >> approach from Wu et al. [1], where they claim that for the >> multiclass problem, their approach is more stable than the method >> from Hastie and Tibshirani (which is used by Weka [4]). >> >> Before I have to read all the papers in detail, maybe there is >> someone familiar with svm probabilities. Is it possible to make a >> statement about which of the implementations is "better" - or does >> it depend on the problem it is applied to? >> >> REFERENCES: ----------- [1] Wu, Lin and Weng, "Probability >> estimates for multi-class classification by pairwise coupling". >> JMLR 5:975-1005, 2004. [2] J. Platt: Fast Training of Support >> Vector Machines using Sequential Minimal Optimization. In B. >> Schoelkopf and C. Burges and A. Smola, editors, Advances in Kernel >> Methods - Support Vector Learning, 1998. [3] S.S. Keerthi, S.K. >> Shevade, C. Bhattacharyya, K.R.K. Murthy (2001). Improvements to >> Platt's SMO Algorithm for SVM Classifier Design. Neural >> Computation. 13(3):637-649. [4] Trevor Hastie, Robert Tibshirani: >> Classification by Pairwise Coupling. In: Advances in Neural >> Information Processing Systems, 1998. [5] Hsuan-Tien Lin, Chih-Jen >> Lin, and Ruby C. Weng. A note on Platt’s probabilistic outputs for >> support vector machines. Technical report, Department of Computer >> Science, National Taiwan University, 2003. >> >> Kind regards >> >> Richard >> >> > ------------------------------------------------------------------------------ >> > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For >> FREE Instantly run your Selenium tests across 300+ browser/OS >> combos. Get unparalleled scalability from the best Selenium >> testing platform available Simple to use. Nothing to install. Get >> started now for free." http://p.sf.net/sfu/SauceLabs [3] >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> [4] > > > > Links: ------ [1] > http://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities > > [2] http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html > [3] http://p.sf.net/sfu/SauceLabs [4] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform > available Simple to use. Nothing to install. Get started now for > free." http://p.sf.net/sfu/SauceLabs > > _______________________________________________ Scikit-learn-general > mailing list Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general