Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-10-19 Thread Lars Buitinck
2013/10/19 Andreas Mueller amuel...@ais.uni-bonn.de:
 The multi-class documentation says
 You don’t need to use these estimators unless you want to experiment
 with different multiclass strategies:
 all classifiers in scikit-learn support multiclass classification
 out-of-the-box. Below is a summary of the classifiers supported by
 scikit-learn grouped by strategy:
 Maybe that should be in bold or something?

Hm, I guess fixing the docs on this point is useless. All the info is
there but nobody reads it.

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135031iu=/4140/ostg.clktrk
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-10-18 Thread Andreas Mueller
On 09/25/2013 05:31 AM, Lars Buitinck wrote:
 2013/9/25 Luca Cerone luca.cer...@gmail.com:
 I am sorry, but I went into the user documentation for logistic regression
 and multiclass classification and didn't find any information about it
 Hm, maybe we should put this in a more prominent place like the
 tutorial. I'll check the docs if I have time.


The multi-class documentation says
You don’t need to use these estimators unless you want to experiment 
with different multiclass strategies:
all classifiers in scikit-learn support multiclass classification 
out-of-the-box. Below is a summary of the classifiers supported by 
scikit-learn grouped by strategy:
Maybe that should be in bold or something?


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135031iu=/4140/ostg.clktrk
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Luca Cerone
Dear Olivier,
thanks for your reply.

On 25 September 2013 10:39, Olivier Grisel olivier.gri...@ensta.org wrote:

 LogisticRegression is a already multiclass classifier by default using
 the One vs Rest / All strategy by default (as implemented internally
 by liblinear which LogisticRegression is a wrapper of). So you don't
 need to use OneVsRest in this case.

 If you want more info on multiclass reductions here is the doc:
 http://scikit-learn.org/stable/modules/multiclass.html


This morning I checked the source for LogisticRegression in
sklearn/linear_model/logistic.py and realized that by default it performs
multiclass classification
(this is not explained in the user guide
http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression,
though).

 Is there a way to check how the normalization of the data is performed?

 What normalization? There is no normalization unless you do it
 yourself with one of those tools and a pipeline:

 http://scikit-learn.org/stable/modules/preprocessing.html


You are right, I got confused with LinearRegression that display something
like *normalize=None* when performing the fit.
I scribbled about checking for it in the documentation on a piece of paper
and got confused when I was looking at the documentation for
LogisticRegression and wrote the email.

There are still a few things that are not clear to me from the
documentation. Can you customize the classifier to perform a different
decision function?
Or can I hook a preprocessing step to be applied to the data (I am
thinking for example for polynomial logistic regression, where from the
original dataset
I want to build all the features of order 2, for example. I am just
asking for educational purposes, I guess there are more appropriate
methods).

Other questions that I have:
1. can I use a norm different from l1 or l2?
2. similarly, can I define my own cost function?
3. can I try alternative optimization algorithms?

I am sure these answers are in the documentation, but I couldn't find them
in the TOC (in the user guide) and I have encountered them, yet.

Thanks again for the help!

Cheers,
Luca
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Lars Buitinck
2013/9/25 Luca Cerone luca.cer...@gmail.com:
 This morning I checked the source for LogisticRegression in
 sklearn/linear_model/logistic.py and realized that by default it performs
 multiclass classification
 (this is not explained in the user guide
 http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression,
 though).

All our classifiers support multiclass classification and this is
documented in various places.

 There are still a few things that are not clear to me from the
 documentation. Can you customize the classifier to perform a different
 decision function?

You can subclass it and override the decision_function method.

 Or can I hook a preprocessing step to be applied to the data (I am
 thinking for example for polynomial logistic regression, where from the
 original dataset

You can implement a polynomial expansion as a transformer object, then
tie it to logistic regression using a sklearn.pipeline.Pipeline. See
the developer's docs, esp. the Rolling your own estimator guide [1],
or our recent paper [2] for the conventions.

 1. can I use a norm different from l1 or l2?

For what?

 2. similarly, can I define my own cost function?

No, unless you hack the source code.

 3. can I try alternative optimization algorithms?

You can try SGDClassifier(loss=log) which also implements
one-vs.-all logistic regression, but trained with stochastic gradient
descent.


[1] 
http://scikit-learn.org/stable/developers/index.html#rolling-your-own-estimator
[2] http://staff.science.uva.nl/~buitinck/papers/scikit-learn-api.pdf

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Luca Cerone

  (this is not explained in the user guide
 
 http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
 ,
  though).

 All our classifiers support multiclass classification and this is
 documented in various places.


I am sorry, but I went into the user documentation for logistic regression
and multiclass classification and didn't find any information about it



  There are still a few things that are not clear to me from the
  documentation. Can you customize the classifier to perform a different
  decision function?

 You can subclass it and override the decision_function method.

  Or can I hook a preprocessing step to be applied to the data (I am
  thinking for example for polynomial logistic regression, where from the
  original dataset

 You can implement a polynomial expansion as a transformer object, then
 tie it to logistic regression using a sklearn.pipeline.Pipeline. See
 the developer's docs, esp. the Rolling your own estimator guide [1],
 or our recent paper [2] for the conventions.


Thanks, I'll look into it



  1. can I use a norm different from l1 or l2?

 For what?


for the penalty in LogisticRegression, but looking at the code it seems it
is not possible.



  2. similarly, can I define my own cost function?

 No, unless you hack the source code.


  3. can I try alternative optimization algorithms?

 You can try SGDClassifier(loss=log) which also implements
 one-vs.-all logistic regression, but trained with stochastic gradient
 descent.


Isn't there an interface to implement my own optimizer and see the
performances?




 [1]
 http://scikit-learn.org/stable/developers/index.html#rolling-your-own-estimator
 [2] http://staff.science.uva.nl/~buitinck/papers/scikit-learn-api.pdf


Thanks for the links, I'll go through them!

Cheers,
Luca
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Lars Buitinck
2013/9/25 Luca Cerone luca.cer...@gmail.com:
 I am sorry, but I went into the user documentation for logistic regression
 and multiclass classification and didn't find any information about it

Hm, maybe we should put this in a more prominent place like the
tutorial. I'll check the docs if I have time.

 for the penalty in LogisticRegression, but looking at the code it seems it
 is not possible.

No, because there are no other options for that in Liblinear.
SGDClassifier supports a linear combination of L1 and L2, though.

 Isn't there an interface to implement my own optimizer and see the
 performances?

Nope. We offer quite a few do-it-yourself hooks, but for the sake of
efficiency and maintainability, we have to hardcode some things.

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Vlad Niculae
 There are still a few things that are not clear to me from the
 documentation. Can you customize the classifier to perform a different
 decision function?

 You can subclass it and override the decision_function method.

While true, this can be misleading. You're just changing the final
step used when making predictions, it will not change learning.
Depending on the nature of the change you want to do, this could be
wrong.

Vlad

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Olivier Grisel
2013/9/25 Luca Cerone luca.cer...@gmail.com:
  (this is not explained in the user guide
 
  http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression,
  though).

 All our classifiers support multiclass classification and this is
 documented in various places.


 I am sorry, but I went into the user documentation for logistic regression
 and multiclass classification and didn't find any information about it

Click on the LogisticRegression in the section you mentioned and you
will end up on the reference doc for this class where it is mentioned:

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression

The multiclass doc also tells explicitly that all linear models (such
as LogisticRegression) are one-vs-all by default:

http://scikit-learn.org/stable/modules/multiclass.html

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Luca Cerone
On 25 September 2013 13:55, Olivier Grisel olivier.gri...@ensta.org wrote:

 2013/9/25 Luca Cerone luca.cer...@gmail.com:
   (this is not explained in the user guide
  
  
 http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
 ,
   though).
 
  All our classifiers support multiclass classification and this is
  documented in various places.
 
 
  I am sorry, but I went into the user documentation for logistic
 regression
  and multiclass classification and didn't find any information about it

 Click on the LogisticRegression in the section you mentioned and you
 will end up on the reference doc for this class where it is mentioned:


 http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression


I feel quite stupid, but I didn't realize it was a link, I thought it was
only bolded
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] Multiclass Logistic Regression.

2013-09-24 Thread Luca Cerone
Dear all,

I am practising with scikit-learn to solve multiclass classification
problems.

As an exercise I am trying to build a model to predict the digits dataset
available with scikit-learn.

Ideally I would like to solve this using logistic regression, building a
predictor for each digit (one vs all approach).

When a new digit comes I predict the output for each of the trained
classifiers and choose the prediction with the maximum value
(as you can see I am not doing anything special, I think that it is the
naivest approach that you can follow).

So far I performed most of this steps manually, but I guess that there
might be some faster/smarter approach.

For example here is my approach that classifies a digit as 0, 1 or Other.


from sklearn.datasets import load_digits
from sklearn.linear_models import LogisticRegression

digits  = load_digits()
data = digits.data
target = digits.target

import pylab as pl
idx = pl.permutation(data.shape[0])

#split the dataset
n_train_sample = 1000
idx_train = idx[0:n_train_sample]
idx_test = idx[0:n_train_sample]
data_train = data[idx_train, : ]
target_train = target[idx_train, : ]
data_test = data[idx_test, : ]
target_test = target[idx_test,:]

#build the classifier that recognize 0:
tar_tr_0 = array(map(lambda x : 1 if x == 0 else 0, target_train))
cfr_0 = LogisticRegression()
cfr_0.fit(data_train, tar_tr_0)

#build the classifier that recognize 1:
tar_tr_0 = array(map(lambda x : 1 if x == 1 else 0, target_train))
cfr_1 = LogisticRegression()
cfr_1.fit(data_train, tar_tr_1)

#build the classifier that recognizes other:
tar_tr_other = array(map(lambda x : 1 if x  1 else 0, target_train))
cfr_other = LogisticRegression()
cfr_other.fit(data_train, tar_tr_other)


Next of course there is some code that takes in input the various trained
classifiers, makes prediction on the test etc etc.

I did this partly for educational purposes (despite I know in theory how
multiclass classification can be performed I never did the prior steps
written before,
which I are useful to learn), partly because I got a bit lost when reading
the documentation (http://scikit-learn.org/stable/modules/multiclass.html).

For the One versus Rest I think I can
use  sklearn.multiclass.OneVsRestClassifier (and now I am trying to do
this).
What I couldn't understand however is how to have access to the internal
classifiers, to check for their score etc etc.
I couldn't understand also how to setup a criterion to chose the output.
What if for example the classifier is very good at discriminating all the
digits but 4 and 1?

Also I wanted to build a classifier using some form of cross validation,
but again I got a bit lost.

Sorry if my questions are quite silly!

Thanks a lot in advance for the help!

Cheers,
Luca

P.s. what if I want to expand the list of features to perform logistic
regression with quadratic terms? Is there an easy way to do this?
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-24 Thread Luca Cerone
Ok training a OneVsAll classifier it was actually easy.
To inspect the individual classifier I can use the .estimators_ attribute?
Do the estimators in it correspond to the .classes_ that is the
estimators_[0] is trained to recognize .classes_[0] vs the other and so on?

Is there a way to check how the normalization of the data is performed?

Thanks again!
Cheers,
Luca


On 24 September 2013 17:42, Luca Cerone luca.cer...@gmail.com wrote:

 Dear all,

 I am practising with scikit-learn to solve multiclass classification
 problems.

 As an exercise I am trying to build a model to predict the digits dataset
 available with scikit-learn.

 Ideally I would like to solve this using logistic regression, building a
 predictor for each digit (one vs all approach).

 When a new digit comes I predict the output for each of the trained
 classifiers and choose the prediction with the maximum value
 (as you can see I am not doing anything special, I think that it is the
 naivest approach that you can follow).

 So far I performed most of this steps manually, but I guess that there
 might be some faster/smarter approach.

 For example here is my approach that classifies a digit as 0, 1 or Other.

 
 from sklearn.datasets import load_digits
 from sklearn.linear_models import LogisticRegression

 digits  = load_digits()
 data = digits.data
 target = digits.target

 import pylab as pl
 idx = pl.permutation(data.shape[0])

 #split the dataset
 n_train_sample = 1000
 idx_train = idx[0:n_train_sample]
 idx_test = idx[0:n_train_sample]
 data_train = data[idx_train, : ]
 target_train = target[idx_train, : ]
 data_test = data[idx_test, : ]
 target_test = target[idx_test,:]

 #build the classifier that recognize 0:
 tar_tr_0 = array(map(lambda x : 1 if x == 0 else 0, target_train))
 cfr_0 = LogisticRegression()
 cfr_0.fit(data_train, tar_tr_0)

 #build the classifier that recognize 1:
 tar_tr_0 = array(map(lambda x : 1 if x == 1 else 0, target_train))
 cfr_1 = LogisticRegression()
 cfr_1.fit(data_train, tar_tr_1)

 #build the classifier that recognizes other:
 tar_tr_other = array(map(lambda x : 1 if x  1 else 0, target_train))
 cfr_other = LogisticRegression()
 cfr_other.fit(data_train, tar_tr_other)
 

 Next of course there is some code that takes in input the various trained
 classifiers, makes prediction on the test etc etc.

 I did this partly for educational purposes (despite I know in theory how
 multiclass classification can be performed I never did the prior steps
 written before,
 which I are useful to learn), partly because I got a bit lost when reading
 the documentation (http://scikit-learn.org/stable/modules/multiclass.html
 ).

 For the One versus Rest I think I can
 use  sklearn.multiclass.OneVsRestClassifier (and now I am trying to do
 this).
 What I couldn't understand however is how to have access to the internal
 classifiers, to check for their score etc etc.
 I couldn't understand also how to setup a criterion to chose the output.
 What if for example the classifier is very good at discriminating all the
 digits but 4 and 1?

 Also I wanted to build a classifier using some form of cross validation,
 but again I got a bit lost.

 Sorry if my questions are quite silly!

 Thanks a lot in advance for the help!

 Cheers,
 Luca

 P.s. what if I want to expand the list of features to perform logistic
 regression with quadratic terms? Is there an easy way to do this?








-- 
*Luca Cerone*

Tel: +447585611951
Skype: luca.cerone
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general