Re: [Scikit-learn-general] SVM for feature selection

Sebastian Raschka Tue, 28 Apr 2015 13:55:38 -0700

Yes, PCA would work too, but then you'll get feature extraction instead of 
feature selection :)



> On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto <rpagli...@appcomsci.com> 
> wrote:
> 
> Hi Sebastian,
> thanks for the hint. I think another way of doing it could be using PCA in 
> the pipeline, and setting the number of components in 'parameters'?
> 
> Thanks, 
> 
> From: Sebastian Raschka [se.rasc...@gmail.com <mailto:se.rasc...@gmail.com>]
> Sent: Tuesday, April 28, 2015 3:20 PM
> To: scikit-learn-general@lists.sourceforge.net 
> <mailto:scikit-learn-general@lists.sourceforge.net>
> Subject: Re: [Scikit-learn-general] SVM for feature selection
> 
> With the L1 regularization, you can't "control" the exact number of features 
> that will be selected, it depends on the data (which features are 
> irrelevant), and the regularization strength. What it basically does is 
> zero-ing out coefficients.
> 
> If you want to experiment with the number of features, you could e.g,. do 
> something like
> 
> from sklearn.svm import SVC
> from sklearn.pipeline import Pipeline
> from sklearn.grid_search import GridSearchCV
> from sklearn.preprocessing import StandardScaler
> from sklearn.feature_selection import RFE
> from sklearn import datasets
> from sklearn.cross_validation import StratifiedKFold
> 
> iris = datasets.load_iris()
> X = iris.data
> y = iris.target
> 
> svc = SVC(random_state=1)
> 
> pipeline = Pipeline([('scl', StandardScaler()),
>                      ('sel', RFE(estimator=svc, step=1))])
> 
> param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4], 
>                'sel__estimator__C': [0.1, 1.0, 10.0, 100.0], 
>                'sel__estimator__kernel': ['linear']}]
> 
> grid_search = GridSearchCV(pipeline, 
>                            param_grid=param_grid, 
>                            verbose=1, 
>                            cv=StratifiedKFold(y, n_folds=10), 
>                            scoring='accuracy', 
>                            n_jobs=1)
> grid_search.fit(X, y)
> print(grid_search.best_estimator_)
> print(grid_search.best_score_)
> 
>> On Apr 28, 2015, at 12:55 PM, Pagliari, Roberto <rpagli...@appcomsci.com 
>> <mailto:rpagli...@appcomsci.com>> wrote:
>> 
>> From the documentation:
>> 
>> "Feature selection is usually used as a pre-processing step before doing the 
>> actual learning. The recommended way to do this in scikit-learn is to use a 
>> sklearn.pipeline.Pipeline 
>> <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline>:
>> clf = Pipeline([
>>   ('feature_selection', LinearSVC(penalty="l1")),
>>   ('classification', RandomForestClassifier())
>> ])
>> clf.fit(X, y)
>> In this snippet we make use of a sklearn.svm.LinearSVC 
>> <http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC>
>>  to evaluate feature importances and select the most relevant features."
>> 
>> How many features get selected? Is that configurable?
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud 
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>>  
>> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________>
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>  
> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________>
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] SVM for feature selection

Reply via email to