Dear all,

I am practising with scikit-learn to solve multiclass classification
problems.

As an exercise I am trying to build a model to predict the digits dataset
available with scikit-learn.

Ideally I would like to solve this using logistic regression, building a
predictor for each digit (one vs all approach).

When a new "digit" comes I predict the output for each of the trained
classifiers and choose the prediction with the maximum value
(as you can see I am not doing anything special, I think that it is the
naivest approach that you can follow).

So far I performed most of this steps manually, but I guess that there
might be some faster/smarter approach.

For example here is my approach that classifies a digit as 0, 1 or Other.

>>>
from sklearn.datasets import load_digits
from sklearn.linear_models import LogisticRegression

digits  = load_digits()
data = digits.data
target = digits.target

import pylab as pl
idx = pl.permutation(data.shape[0])

#split the dataset
n_train_sample = 1000
idx_train = idx[0:n_train_sample]
idx_test = idx[0:n_train_sample]
data_train = data[idx_train, : ]
target_train = target[idx_train, : ]
data_test = data[idx_test, : ]
target_test = target[idx_test,:]

#build the classifier that recognize 0:
tar_tr_0 = array(map(lambda x : 1 if x == 0 else 0, target_train))
cfr_0 = LogisticRegression()
cfr_0.fit(data_train, tar_tr_0)

#build the classifier that recognize 1:
tar_tr_0 = array(map(lambda x : 1 if x == 1 else 0, target_train))
cfr_1 = LogisticRegression()
cfr_1.fit(data_train, tar_tr_1)

#build the classifier that recognizes "other":
tar_tr_other = array(map(lambda x : 1 if x > 1 else 0, target_train))
cfr_other = LogisticRegression()
cfr_other.fit(data_train, tar_tr_other)
<<<

Next of course there is some code that takes in input the various trained
classifiers, makes prediction on the test etc etc.

I did this partly for educational purposes (despite I know in theory how
multiclass classification can be performed I never did the prior steps
written before,
which I are useful to learn), partly because I got a bit lost when reading
the documentation (http://scikit-learn.org/stable/modules/multiclass.html).

For the One versus Rest I think I can
use  sklearn.multiclass.OneVsRestClassifier (and now I am trying to do
this).
What I couldn't understand however is how to have access to the internal
classifiers, to check for their score etc etc.
I couldn't understand also how to setup a criterion to chose the output.
What if for example the classifier is very good at discriminating all the
digits but 4 and 1?

Also I wanted to build a classifier using some form of cross validation,
but again I got a bit lost.

Sorry if my questions are quite silly!

Thanks a lot in advance for the help!

Cheers,
Luca

P.s. what if I want to "expand" the list of features to perform logistic
regression with quadratic terms? Is there an easy way to do this?
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to