hi Sebastian,
correct. however, if you set the number of components, you should get feature
selection as well.
Thank you,
________________________________
From: Sebastian Raschka [se.rasc...@gmail.com]
Sent: Tuesday, April 28, 2015 4:53 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] SVM for feature selection
Yes, PCA would work too, but then you'll get feature extraction instead of
feature selection :)
On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto
<rpagli...@appcomsci.com<mailto:rpagli...@appcomsci.com>> wrote:
Hi Sebastian,
thanks for the hint. I think another way of doing it could be using PCA in the
pipeline, and setting the number of components in 'parameters'?
Thanks,
________________________________
From: Sebastian Raschka [se.rasc...@gmail.com<mailto:se.rasc...@gmail.com>]
Sent: Tuesday, April 28, 2015 3:20 PM
To:
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] SVM for feature selection
With the L1 regularization, you can't "control" the exact number of features
that will be selected, it depends on the data (which features are irrelevant),
and the regularization strength. What it basically does is zero-ing out
coefficients.
If you want to experiment with the number of features, you could e.g,. do
something like
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFE
from sklearn import datasets
from sklearn.cross_validation import StratifiedKFold
iris = datasets.load_iris()
X = iris.data
y = iris.target
svc = SVC(random_state=1)
pipeline = Pipeline([('scl', StandardScaler()),
('sel', RFE(estimator=svc, step=1))])
param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4],
'sel__estimator__C': [0.1, 1.0, 10.0, 100.0],
'sel__estimator__kernel': ['linear']}]
grid_search = GridSearchCV(pipeline,
param_grid=param_grid,
verbose=1,
cv=StratifiedKFold(y, n_folds=10),
scoring='accuracy',
n_jobs=1)
grid_search.fit(X, y)
print(grid_search.best_estimator_)
print(grid_search.best_score_)
On Apr 28, 2015, at 12:55 PM, Pagliari, Roberto
<rpagli...@appcomsci.com<mailto:rpagli...@appcomsci.com>> wrote:
>From the documentation:
"Feature selection is usually used as a pre-processing step before doing the
actual learning. The recommended way to do this in scikit-learn is to use a
sklearn.pipeline.Pipeline<http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline>:
clf = Pipeline([
('feature_selection', LinearSVC(penalty="l1")),
('classification', RandomForestClassifier())
])
clf.fit(X, y)
In this snippet we make use of a
sklearn.svm.LinearSVC<http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC>
to evaluate feature importances and select the most relevant features."
How many features get selected? Is that configurable?
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general