Re: [Scikit-learn-general] Multiclass Logistic Regression.
2013/10/19 Andreas Mueller amuel...@ais.uni-bonn.de: The multi-class documentation says You don’t need to use these estimators unless you want to experiment with different multiclass strategies: all classifiers in scikit-learn support multiclass classification out-of-the-box. Below is a summary of the classifiers supported by scikit-learn grouped by strategy: Maybe that should be in bold or something? Hm, I guess fixing the docs on this point is useless. All the info is there but nobody reads it. -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135031iu=/4140/ostg.clktrk ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
On 09/25/2013 05:31 AM, Lars Buitinck wrote: 2013/9/25 Luca Cerone luca.cer...@gmail.com: I am sorry, but I went into the user documentation for logistic regression and multiclass classification and didn't find any information about it Hm, maybe we should put this in a more prominent place like the tutorial. I'll check the docs if I have time. The multi-class documentation says You don’t need to use these estimators unless you want to experiment with different multiclass strategies: all classifiers in scikit-learn support multiclass classification out-of-the-box. Below is a summary of the classifiers supported by scikit-learn grouped by strategy: Maybe that should be in bold or something? -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135031iu=/4140/ostg.clktrk ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
Dear Olivier, thanks for your reply. On 25 September 2013 10:39, Olivier Grisel olivier.gri...@ensta.org wrote: LogisticRegression is a already multiclass classifier by default using the One vs Rest / All strategy by default (as implemented internally by liblinear which LogisticRegression is a wrapper of). So you don't need to use OneVsRest in this case. If you want more info on multiclass reductions here is the doc: http://scikit-learn.org/stable/modules/multiclass.html This morning I checked the source for LogisticRegression in sklearn/linear_model/logistic.py and realized that by default it performs multiclass classification (this is not explained in the user guide http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression, though). Is there a way to check how the normalization of the data is performed? What normalization? There is no normalization unless you do it yourself with one of those tools and a pipeline: http://scikit-learn.org/stable/modules/preprocessing.html You are right, I got confused with LinearRegression that display something like *normalize=None* when performing the fit. I scribbled about checking for it in the documentation on a piece of paper and got confused when I was looking at the documentation for LogisticRegression and wrote the email. There are still a few things that are not clear to me from the documentation. Can you customize the classifier to perform a different decision function? Or can I hook a preprocessing step to be applied to the data (I am thinking for example for polynomial logistic regression, where from the original dataset I want to build all the features of order 2, for example. I am just asking for educational purposes, I guess there are more appropriate methods). Other questions that I have: 1. can I use a norm different from l1 or l2? 2. similarly, can I define my own cost function? 3. can I try alternative optimization algorithms? I am sure these answers are in the documentation, but I couldn't find them in the TOC (in the user guide) and I have encountered them, yet. Thanks again for the help! Cheers, Luca -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
2013/9/25 Luca Cerone luca.cer...@gmail.com: This morning I checked the source for LogisticRegression in sklearn/linear_model/logistic.py and realized that by default it performs multiclass classification (this is not explained in the user guide http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression, though). All our classifiers support multiclass classification and this is documented in various places. There are still a few things that are not clear to me from the documentation. Can you customize the classifier to perform a different decision function? You can subclass it and override the decision_function method. Or can I hook a preprocessing step to be applied to the data (I am thinking for example for polynomial logistic regression, where from the original dataset You can implement a polynomial expansion as a transformer object, then tie it to logistic regression using a sklearn.pipeline.Pipeline. See the developer's docs, esp. the Rolling your own estimator guide [1], or our recent paper [2] for the conventions. 1. can I use a norm different from l1 or l2? For what? 2. similarly, can I define my own cost function? No, unless you hack the source code. 3. can I try alternative optimization algorithms? You can try SGDClassifier(loss=log) which also implements one-vs.-all logistic regression, but trained with stochastic gradient descent. [1] http://scikit-learn.org/stable/developers/index.html#rolling-your-own-estimator [2] http://staff.science.uva.nl/~buitinck/papers/scikit-learn-api.pdf -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
(this is not explained in the user guide http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression , though). All our classifiers support multiclass classification and this is documented in various places. I am sorry, but I went into the user documentation for logistic regression and multiclass classification and didn't find any information about it There are still a few things that are not clear to me from the documentation. Can you customize the classifier to perform a different decision function? You can subclass it and override the decision_function method. Or can I hook a preprocessing step to be applied to the data (I am thinking for example for polynomial logistic regression, where from the original dataset You can implement a polynomial expansion as a transformer object, then tie it to logistic regression using a sklearn.pipeline.Pipeline. See the developer's docs, esp. the Rolling your own estimator guide [1], or our recent paper [2] for the conventions. Thanks, I'll look into it 1. can I use a norm different from l1 or l2? For what? for the penalty in LogisticRegression, but looking at the code it seems it is not possible. 2. similarly, can I define my own cost function? No, unless you hack the source code. 3. can I try alternative optimization algorithms? You can try SGDClassifier(loss=log) which also implements one-vs.-all logistic regression, but trained with stochastic gradient descent. Isn't there an interface to implement my own optimizer and see the performances? [1] http://scikit-learn.org/stable/developers/index.html#rolling-your-own-estimator [2] http://staff.science.uva.nl/~buitinck/papers/scikit-learn-api.pdf Thanks for the links, I'll go through them! Cheers, Luca -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
2013/9/25 Luca Cerone luca.cer...@gmail.com: I am sorry, but I went into the user documentation for logistic regression and multiclass classification and didn't find any information about it Hm, maybe we should put this in a more prominent place like the tutorial. I'll check the docs if I have time. for the penalty in LogisticRegression, but looking at the code it seems it is not possible. No, because there are no other options for that in Liblinear. SGDClassifier supports a linear combination of L1 and L2, though. Isn't there an interface to implement my own optimizer and see the performances? Nope. We offer quite a few do-it-yourself hooks, but for the sake of efficiency and maintainability, we have to hardcode some things. -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
There are still a few things that are not clear to me from the documentation. Can you customize the classifier to perform a different decision function? You can subclass it and override the decision_function method. While true, this can be misleading. You're just changing the final step used when making predictions, it will not change learning. Depending on the nature of the change you want to do, this could be wrong. Vlad -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
2013/9/25 Luca Cerone luca.cer...@gmail.com: (this is not explained in the user guide http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression, though). All our classifiers support multiclass classification and this is documented in various places. I am sorry, but I went into the user documentation for logistic regression and multiclass classification and didn't find any information about it Click on the LogisticRegression in the section you mentioned and you will end up on the reference doc for this class where it is mentioned: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression The multiclass doc also tells explicitly that all linear models (such as LogisticRegression) are one-vs-all by default: http://scikit-learn.org/stable/modules/multiclass.html -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
On 25 September 2013 13:55, Olivier Grisel olivier.gri...@ensta.org wrote: 2013/9/25 Luca Cerone luca.cer...@gmail.com: (this is not explained in the user guide http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression , though). All our classifiers support multiclass classification and this is documented in various places. I am sorry, but I went into the user documentation for logistic regression and multiclass classification and didn't find any information about it Click on the LogisticRegression in the section you mentioned and you will end up on the reference doc for this class where it is mentioned: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression I feel quite stupid, but I didn't realize it was a link, I thought it was only bolded -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
[Scikit-learn-general] Multiclass Logistic Regression.
Dear all, I am practising with scikit-learn to solve multiclass classification problems. As an exercise I am trying to build a model to predict the digits dataset available with scikit-learn. Ideally I would like to solve this using logistic regression, building a predictor for each digit (one vs all approach). When a new digit comes I predict the output for each of the trained classifiers and choose the prediction with the maximum value (as you can see I am not doing anything special, I think that it is the naivest approach that you can follow). So far I performed most of this steps manually, but I guess that there might be some faster/smarter approach. For example here is my approach that classifies a digit as 0, 1 or Other. from sklearn.datasets import load_digits from sklearn.linear_models import LogisticRegression digits = load_digits() data = digits.data target = digits.target import pylab as pl idx = pl.permutation(data.shape[0]) #split the dataset n_train_sample = 1000 idx_train = idx[0:n_train_sample] idx_test = idx[0:n_train_sample] data_train = data[idx_train, : ] target_train = target[idx_train, : ] data_test = data[idx_test, : ] target_test = target[idx_test,:] #build the classifier that recognize 0: tar_tr_0 = array(map(lambda x : 1 if x == 0 else 0, target_train)) cfr_0 = LogisticRegression() cfr_0.fit(data_train, tar_tr_0) #build the classifier that recognize 1: tar_tr_0 = array(map(lambda x : 1 if x == 1 else 0, target_train)) cfr_1 = LogisticRegression() cfr_1.fit(data_train, tar_tr_1) #build the classifier that recognizes other: tar_tr_other = array(map(lambda x : 1 if x 1 else 0, target_train)) cfr_other = LogisticRegression() cfr_other.fit(data_train, tar_tr_other) Next of course there is some code that takes in input the various trained classifiers, makes prediction on the test etc etc. I did this partly for educational purposes (despite I know in theory how multiclass classification can be performed I never did the prior steps written before, which I are useful to learn), partly because I got a bit lost when reading the documentation (http://scikit-learn.org/stable/modules/multiclass.html). For the One versus Rest I think I can use sklearn.multiclass.OneVsRestClassifier (and now I am trying to do this). What I couldn't understand however is how to have access to the internal classifiers, to check for their score etc etc. I couldn't understand also how to setup a criterion to chose the output. What if for example the classifier is very good at discriminating all the digits but 4 and 1? Also I wanted to build a classifier using some form of cross validation, but again I got a bit lost. Sorry if my questions are quite silly! Thanks a lot in advance for the help! Cheers, Luca P.s. what if I want to expand the list of features to perform logistic regression with quadratic terms? Is there an easy way to do this? -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] Multiclass Logistic Regression.
Ok training a OneVsAll classifier it was actually easy. To inspect the individual classifier I can use the .estimators_ attribute? Do the estimators in it correspond to the .classes_ that is the estimators_[0] is trained to recognize .classes_[0] vs the other and so on? Is there a way to check how the normalization of the data is performed? Thanks again! Cheers, Luca On 24 September 2013 17:42, Luca Cerone luca.cer...@gmail.com wrote: Dear all, I am practising with scikit-learn to solve multiclass classification problems. As an exercise I am trying to build a model to predict the digits dataset available with scikit-learn. Ideally I would like to solve this using logistic regression, building a predictor for each digit (one vs all approach). When a new digit comes I predict the output for each of the trained classifiers and choose the prediction with the maximum value (as you can see I am not doing anything special, I think that it is the naivest approach that you can follow). So far I performed most of this steps manually, but I guess that there might be some faster/smarter approach. For example here is my approach that classifies a digit as 0, 1 or Other. from sklearn.datasets import load_digits from sklearn.linear_models import LogisticRegression digits = load_digits() data = digits.data target = digits.target import pylab as pl idx = pl.permutation(data.shape[0]) #split the dataset n_train_sample = 1000 idx_train = idx[0:n_train_sample] idx_test = idx[0:n_train_sample] data_train = data[idx_train, : ] target_train = target[idx_train, : ] data_test = data[idx_test, : ] target_test = target[idx_test,:] #build the classifier that recognize 0: tar_tr_0 = array(map(lambda x : 1 if x == 0 else 0, target_train)) cfr_0 = LogisticRegression() cfr_0.fit(data_train, tar_tr_0) #build the classifier that recognize 1: tar_tr_0 = array(map(lambda x : 1 if x == 1 else 0, target_train)) cfr_1 = LogisticRegression() cfr_1.fit(data_train, tar_tr_1) #build the classifier that recognizes other: tar_tr_other = array(map(lambda x : 1 if x 1 else 0, target_train)) cfr_other = LogisticRegression() cfr_other.fit(data_train, tar_tr_other) Next of course there is some code that takes in input the various trained classifiers, makes prediction on the test etc etc. I did this partly for educational purposes (despite I know in theory how multiclass classification can be performed I never did the prior steps written before, which I are useful to learn), partly because I got a bit lost when reading the documentation (http://scikit-learn.org/stable/modules/multiclass.html ). For the One versus Rest I think I can use sklearn.multiclass.OneVsRestClassifier (and now I am trying to do this). What I couldn't understand however is how to have access to the internal classifiers, to check for their score etc etc. I couldn't understand also how to setup a criterion to chose the output. What if for example the classifier is very good at discriminating all the digits but 4 and 1? Also I wanted to build a classifier using some form of cross validation, but again I got a bit lost. Sorry if my questions are quite silly! Thanks a lot in advance for the help! Cheers, Luca P.s. what if I want to expand the list of features to perform logistic regression with quadratic terms? Is there an easy way to do this? -- *Luca Cerone* Tel: +447585611951 Skype: luca.cerone -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general