Yes, PCA would work too, but then you'll get feature extraction instead of
feature selection :)
> On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
> wrote:
>
> Hi Sebastian,
> thanks for the hint. I think another way of doing it could be using PCA in
> the pipeline, and setting the number of components in 'parameters'?
>
> Thanks,
>
> From: Sebastian Raschka [se.rasc...@gmail.com <mailto:se.rasc...@gmail.com>]
> Sent: Tuesday, April 28, 2015 3:20 PM
> To: scikit-learn-general@lists.sourceforge.net
> <mailto:scikit-learn-general@lists.sourceforge.net>
> Subject: Re: [Scikit-learn-general] SVM for feature selection
>
> With the L1 regularization, you can't "control" the exact number of features
> that will be selected, it depends on the data (which features are
> irrelevant), and the regularization strength. What it basically does is
> zero-ing out coefficients.
>
> If you want to experiment with the number of features, you could e.g,. do
> something like
>
> from sklearn.svm import SVC
> from sklearn.pipeline import Pipeline
> from sklearn.grid_search import GridSearchCV
> from sklearn.preprocessing import StandardScaler
> from sklearn.feature_selection import RFE
> from sklearn import datasets
> from sklearn.cross_validation import StratifiedKFold
>
> iris = datasets.load_iris()
> X = iris.data
> y = iris.target
>
> svc = SVC(random_state=1)
>
> pipeline = Pipeline([('scl', StandardScaler()),
> ('sel', RFE(estimator=svc, step=1))])
>
> param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4],
> 'sel__estimator__C': [0.1, 1.0, 10.0, 100.0],
> 'sel__estimator__kernel': ['linear']}]
>
> grid_search = GridSearchCV(pipeline,
> param_grid=param_grid,
> verbose=1,
> cv=StratifiedKFold(y, n_folds=10),
> scoring='accuracy',
> n_jobs=1)
> grid_search.fit(X, y)
> print(grid_search.best_estimator_)
> print(grid_search.best_score_)
>
>> On Apr 28, 2015, at 12:55 PM, Pagliari, Roberto <rpagli...@appcomsci.com
>> <mailto:rpagli...@appcomsci.com>> wrote:
>>
>> From the documentation:
>>
>> "Feature selection is usually used as a pre-processing step before doing the
>> actual learning. The recommended way to do this in scikit-learn is to use a
>> sklearn.pipeline.Pipeline
>> <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline>:
>> clf = Pipeline([
>> ('feature_selection', LinearSVC(penalty="l1")),
>> ('classification', RandomForestClassifier())
>> ])
>> clf.fit(X, y)
>> In this snippet we make use of a sklearn.svm.LinearSVC
>> <http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC>
>> to evaluate feature importances and select the most relevant features."
>>
>> How many features get selected? Is that configurable?
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>>
>> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________>
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
>
> <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________>
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general