Thanks for the info.

I did not explain myself clearly. I just meant to say that once PCA is done, 
you could choose a smaller number of features, starting from the most relevant. 
To do that, I would still need to implement a custom transformer.

Thank you,
________________________________
From: Eraldo Pomponi [eraldo.pomp...@gmail.com]
Sent: Tuesday, April 28, 2015 5:14 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] SVM for feature selection

Dear Roberto,

Just in case you want to better understand what Sebastian suggested, let me
suggest you two short videos taken from the ML course of Hastie and Tibshirani
about the shrinkage methods:

https://www.youtube.com/watch?v=cSKzqb0EKS0

https://www.youtube.com/watch?v=A5I1G1MfUmA

They helped me a lot to understand how the L1 norm does work in feature
selection... (e.g. compared to L2).

HTH,
Eraldo



On Tue, Apr 28, 2015 at 10:53 PM, Sebastian Raschka 
<se.rasc...@gmail.com<mailto:se.rasc...@gmail.com>> wrote:
Yes, PCA would work too, but then you'll get feature extraction instead of 
feature selection :)


On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto 
<rpagli...@appcomsci.com<mailto:rpagli...@appcomsci.com>> wrote:

Hi Sebastian,
thanks for the hint. I think another way of doing it could be using PCA in the 
pipeline, and setting the number of components in 'parameters'?

Thanks,

________________________________
From: Sebastian Raschka [se.rasc...@gmail.com<mailto:se.rasc...@gmail.com>]
Sent: Tuesday, April 28, 2015 3:20 PM
To: 
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] SVM for feature selection

With the L1 regularization, you can't "control" the exact number of features 
that will be selected, it depends on the data (which features are irrelevant), 
and the regularization strength. What it basically does is zero-ing out 
coefficients.

If you want to experiment with the number of features, you could e.g,. do 
something like

from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFE
from sklearn import datasets
from sklearn.cross_validation import StratifiedKFold

iris = datasets.load_iris()
X = iris.data
y = iris.target

svc = SVC(random_state=1)

pipeline = Pipeline([('scl', StandardScaler()),
                     ('sel', RFE(estimator=svc, step=1))])

param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4],
               'sel__estimator__C': [0.1, 1.0, 10.0, 100.0],
               'sel__estimator__kernel': ['linear']}]

grid_search = GridSearchCV(pipeline,
                           param_grid=param_grid,
                           verbose=1,
                           cv=StratifiedKFold(y, n_folds=10),
                           scoring='accuracy',
                           n_jobs=1)
grid_search.fit(X, y)
print(grid_search.best_estimator_)
print(grid_search.best_score_)

On Apr 28, 2015, at 12:55 PM, Pagliari, Roberto 
<rpagli...@appcomsci.com<mailto:rpagli...@appcomsci.com>> wrote:

>From the documentation:

"Feature selection is usually used as a pre-processing step before doing the 
actual learning. The recommended way to do this in scikit-learn is to use a 
sklearn.pipeline.Pipeline<http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline>:

clf = Pipeline([
  ('feature_selection', LinearSVC(penalty="l1")),
  ('classification', RandomForestClassifier())
])
clf.fit(X, y)


In this snippet we make use of a 
sklearn.svm.LinearSVC<http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC>
 to evaluate feature importances and select the most relevant features."

How many features get selected? Is that configurable?
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to