Hi Afik. Thanks for your mail. Why do you want to use probability estimates in liblinear? Do you know what they are doing? I would expect they just do a soft-max. This is not really "probability output", it is just a way to normalize the decision function. This could be very easily implemented in Python - or by changing the check. I think the reason we haven't done this yet is that this functionality doesn't really have a mathematical interpretation, in particular since liblinear only implements one-vs-rest multi-class classification. (Do you have two or more classes?).
Do you have strong reasons not to use logistic regression? Cheers, Andy On 10/30/2012 09:40 PM, Afik Cohen wrote: > Hi all, > > We've been using scikit-learn 0.12 to train LIBLINEAR's implementation of > LinearSVC. We require probability estimates for each prediction, and this > isn't > supported out of the box by LinearSVC, so I emailed LIBLINEAR's author, > Dr. Chih-Jen Lin, for assistance. He showed us that LIBLINEAR does support > prediction probability estimates for LinearSVC; all that is required is a > small > code change to short-circuit the check_probability_model function in > linear.cpp as detailed in > http://www.csie.ntu.edu.tw/~cjlin/liblinear/FAQ.html > under "Q: Why you support probability outputs for logistic regression > only?". > This small code change worked great; we rebuilt scikit with the modified > linear.cpp and added a bit of code to sklearn/svm/classes.py so that > LinearSVC() could have the predict_proba() member function. (I've attached > our > small patch to this post; please see below for the changes.) > > Now, however, we've run into a problem when we tried to upgrade to > scikit-learn 0.13. It appears there have been significant changes to the > underlying LIBLINEANR library as well as changes to the svm/classes > interfaces; > a recent commit shows almost 4000 lines being removed from linear.cpp: > https://github.com/larsmans/scikit- > learn/commit/706319655a1380a154da92d5dd83128faf532881 > > Unfortunately, it appears our patch to the LIBLINEAR library to support > prediction probabilities for LinearSVC is now incompatible. Could someone > shed > some light on the reasoning behind this change to the core library and help > us > adapt our patch to the current state? We use LinearSVC because it trains the > fastest and gives the most accurate results, and even gives us prediction > probabilities after applying this patch. We'd like to continue doing so with > current and future versions of scikit! > > Thanks, > Afik Cohen > Abhijeet Kolhe > > > LinearSVC prediction probability patch follows: > > > diff --git a/sklearn/svm/classes.py b/sklearn/svm/classes.py > index 79cb76d..d432792 100644 > --- a/sklearn/svm/classes.py > +++ b/sklearn/svm/classes.py > @@ -1,7 +1,9 @@ > +import numpy as np > + > from ..base import ClassifierMixin, RegressorMixin > from ..feature_selection.selector_mixin import SelectorMixin > from .base import BaseLibLinear, BaseSVC, BaseLibSVM > - > +from ..svm.liblinear import csr_predict_prob_wrap, predict_prob_wrap > > class LinearSVC(BaseLibLinear, ClassifierMixin, SelectorMixin): > """Linear Support Vector Classification. > @@ -128,7 +130,54 @@ class LinearSVC(BaseLibLinear, ClassifierMixin, > SelectorMixin): > """ > > # all the implementation is provided by the mixins > - pass > + > + def predict_proba(self, X): > + """Probability estimates. > + > + The returned estimates for all classes are ordered by the > + label of classes. > + > + Parameters > + ---------- > + X : array-like, shape = [n_samples, n_features] > + > + Returns > + ------- > + T : array-like, shape = [n_samples, n_classes] > + Returns the probability of the sample for each class in > + the model, where classes are ordered by arithmetical > + order. > + """ > + X = self._validate_for_predict(X) > + > + #C = 0.0 # C is not useful here > + > + prob_wrap = (csr_predict_prob_wrap if self._sparse else > + predict_prob_wrap) > + probas = prob_wrap(X, self.raw_coef_, self._get_solver_type(), > + self.tol, self.C, self.class_weight_label_, > + self.class_weight_, self.label_, self._get_bias()) > + return probas[:, np.argsort(self.label_)] > + > + def predict_log_proba(self, X): > + """Log of Probability estimates. > + > + The returned estimates for all classes are ordered by the > + label of classes. > + > + Parameters > + ---------- > + X : array-like, shape = [n_samples, n_features] > + > + Returns > + ------- > + T : array-like, shape = [n_samples, n_classes] > + Returns the log-probabilities of the sample for each class in > + the model, where classes are ordered by arithmetical > + order. > + """ > + return np.log(self.predict_proba(X)) > + > > > class SVC(BaseSVC): > diff --git a/sklearn/svm/src/liblinear/linear.cpp > b/sklearn/svm/src/liblinear/linear.cpp > index 2cbc773..ddbbced 100644 > --- a/sklearn/svm/src/liblinear/linear.cpp > +++ b/sklearn/svm/src/liblinear/linear.cpp > @@ -2846,9 +2846,10 @@ const char *check_parameter(const problem *prob, const > parameter *param) > > int check_probability_model(const struct model *model_) > { > - return (model_->param.solver_type==L2R_LR || > - model_->param.solver_type==L2R_LR_DUAL || > - model_->param.solver_type==L1R_LR); > +// return (model_->param.solver_type==L2R_LR || > +// model_->param.solver_type==L2R_LR_DUAL || > +// model_->param.solver_type==L1R_LR); > + return 1; > } > > void set_print_string_function(void (*print_func)(const char*)) > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
