Hi folks,
When it comes to perform feature selection, I often suggest to use ElasticNet
which is a combination of an L1 and L2 penalties. When using a penalty-based
feature selection one must make sure that features are standardized otherwise
the selection can end up being misleading.
Cheers,
Ardo
> On 28 Apr 2015, at 23:05, Pagliari, Roberto <rpagli...@appcomsci.com> wrote:
>
> hi Sebastian,
> correct. however, if you set the number of components, you should get feature
> selection as well.
>
> Thank you,
>
> From: Sebastian Raschka [se.rasc...@gmail.com]
> Sent: Tuesday, April 28, 2015 4:53 PM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] SVM for feature selection
>
> Yes, PCA would work too, but then you'll get feature extraction instead of
> feature selection :)
>
>
>> On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
>> wrote:
>>
>> Hi Sebastian,
>> thanks for the hint. I think another way of doing it could be using PCA in
>> the pipeline, and setting the number of components in 'parameters'?
>>
>> Thanks,
>>
>> From: Sebastian Raschka [se.rasc...@gmail.com]
>> Sent: Tuesday, April 28, 2015 3:20 PM
>> To: scikit-learn-general@lists.sourceforge.net
>> Subject: Re: [Scikit-learn-general] SVM for feature selection
>>
>> With the L1 regularization, you can't "control" the exact number of features
>> that will be selected, it depends on the data (which features are
>> irrelevant), and the regularization strength. What it basically does is
>> zero-ing out coefficients.
>>
>> If you want to experiment with the number of features, you could e.g,. do
>> something like
>>
>> from sklearn.svm import SVC
>> from sklearn.pipeline import Pipeline
>> from sklearn.grid_search import GridSearchCV
>> from sklearn.preprocessing import StandardScaler
>> from sklearn.feature_selection import RFE
>> from sklearn import datasets
>> from sklearn.cross_validation import StratifiedKFold
>>
>> iris = datasets.load_iris()
>> X = iris.data
>> y = iris.target
>>
>> svc = SVC(random_state=1)
>>
>> pipeline = Pipeline([('scl', StandardScaler()),
>> ('sel', RFE(estimator=svc, step=1))])
>>
>> param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4],
>> 'sel__estimator__C': [0.1, 1.0, 10.0, 100.0],
>> 'sel__estimator__kernel': ['linear']}]
>>
>> grid_search = GridSearchCV(pipeline,
>> param_grid=param_grid,
>> verbose=1,
>> cv=StratifiedKFold(y, n_folds=10),
>> scoring='accuracy',
>> n_jobs=1)
>> grid_search.fit(X, y)
>> print(grid_search.best_estimator_)
>> print(grid_search.best_score_)
>>
>>> On Apr 28, 2015, at 12:55 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
>>> wrote:
>>>
>>> From the documentation:
>>>
>>> "Feature selection is usually used as a pre-processing step before doing
>>> the actual learning. The recommended way to do this in scikit-learn is to
>>> use a sklearn.pipeline.Pipeline:
>>> clf = Pipeline([
>>> ('feature_selection', LinearSVC(penalty="l1")),
>>> ('classification', RandomForestClassifier())
>>> ])
>>> clf.fit(X, y)
>>> In this snippet we make use of a sklearn.svm.LinearSVC to evaluate feature
>>> importances and select the most relevant features."
>>>
>>> How many features get selected? Is that configurable?
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general