Hi folks,
When it comes to perform feature selection, I often suggest to use ElasticNet 
which is a combination of an L1 and L2 penalties. When using a penalty-based 
feature selection one must make sure that features are standardized otherwise 
the selection can end up being misleading.


Cheers,
Ardo 

> On 28 Apr 2015, at 23:05, Pagliari, Roberto <rpagli...@appcomsci.com> wrote:
> 
> hi Sebastian,
> correct. however, if you set the number of components, you should get feature 
> selection as well.
> 
> Thank you, 
> 
> From: Sebastian Raschka [se.rasc...@gmail.com]
> Sent: Tuesday, April 28, 2015 4:53 PM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] SVM for feature selection
> 
> Yes, PCA would work too, but then you'll get feature extraction instead of 
> feature selection :)
> 
> 
>> On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto <rpagli...@appcomsci.com> 
>> wrote:
>> 
>> Hi Sebastian,
>> thanks for the hint. I think another way of doing it could be using PCA in 
>> the pipeline, and setting the number of components in 'parameters'?
>> 
>> Thanks, 
>> 
>> From: Sebastian Raschka [se.rasc...@gmail.com]
>> Sent: Tuesday, April 28, 2015 3:20 PM
>> To: scikit-learn-general@lists.sourceforge.net
>> Subject: Re: [Scikit-learn-general] SVM for feature selection
>> 
>> With the L1 regularization, you can't "control" the exact number of features 
>> that will be selected, it depends on the data (which features are 
>> irrelevant), and the regularization strength. What it basically does is 
>> zero-ing out coefficients.
>> 
>> If you want to experiment with the number of features, you could e.g,. do 
>> something like
>> 
>> from sklearn.svm import SVC
>> from sklearn.pipeline import Pipeline
>> from sklearn.grid_search import GridSearchCV
>> from sklearn.preprocessing import StandardScaler
>> from sklearn.feature_selection import RFE
>> from sklearn import datasets
>> from sklearn.cross_validation import StratifiedKFold
>> 
>> iris = datasets.load_iris()
>> X = iris.data
>> y = iris.target
>> 
>> svc = SVC(random_state=1)
>> 
>> pipeline = Pipeline([('scl', StandardScaler()),
>>                      ('sel', RFE(estimator=svc, step=1))])
>> 
>> param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4], 
>>                'sel__estimator__C': [0.1, 1.0, 10.0, 100.0], 
>>                'sel__estimator__kernel': ['linear']}]
>> 
>> grid_search = GridSearchCV(pipeline, 
>>                            param_grid=param_grid, 
>>                            verbose=1, 
>>                            cv=StratifiedKFold(y, n_folds=10), 
>>                            scoring='accuracy', 
>>                            n_jobs=1)
>> grid_search.fit(X, y)
>> print(grid_search.best_estimator_)
>> print(grid_search.best_score_)
>> 
>>> On Apr 28, 2015, at 12:55 PM, Pagliari, Roberto <rpagli...@appcomsci.com> 
>>> wrote:
>>> 
>>> From the documentation:
>>> 
>>> "Feature selection is usually used as a pre-processing step before doing 
>>> the actual learning. The recommended way to do this in scikit-learn is to 
>>> use a sklearn.pipeline.Pipeline:
>>> clf = Pipeline([
>>>   ('feature_selection', LinearSVC(penalty="l1")),
>>>   ('classification', RandomForestClassifier())
>>> ])
>>> clf.fit(X, y)
>>> In this snippet we make use of a sklearn.svm.LinearSVC to evaluate feature 
>>> importances and select the most relevant features."
>>> 
>>> How many features get selected? Is that configurable?
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud 
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud 
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to