Dear Roberto,
Just in case you want to better understand what Sebastian suggested, let me
suggest you two short videos taken from the ML course of Hastie and
Tibshirani
about the shrinkage methods:
https://www.youtube.com/watch?v=cSKzqb0EKS0
https://www.youtube.com/watch?v=A5I1G1MfUmA
They helped me a lot to understand how the L1 norm does work in feature
selection... (e.g. compared to L2).
HTH,
Eraldo
On Tue, Apr 28, 2015 at 10:53 PM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:
> Yes, PCA would work too, but then you'll get feature extraction instead of
> feature selection :)
>
>
> On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
> wrote:
>
> Hi Sebastian,
> thanks for the hint. I think another way of doing it could be using PCA in
> the pipeline, and setting the number of components in 'parameters'?
>
> Thanks,
>
> ------------------------------
> *From:* Sebastian Raschka [se.rasc...@gmail.com]
> *Sent:* Tuesday, April 28, 2015 3:20 PM
> *To:* scikit-learn-general@lists.sourceforge.net
> *Subject:* Re: [Scikit-learn-general] SVM for feature selection
>
> With the L1 regularization, you can't "control" the exact number of
> features that will be selected, it depends on the data (which features are
> irrelevant), and the regularization strength. What it basically does is
> zero-ing out coefficients.
>
> If you want to experiment with the number of features, you could e.g,. do
> something like
>
> from sklearn.svm import SVC
> from sklearn.pipeline import Pipeline
> from sklearn.grid_search import GridSearchCV
> from sklearn.preprocessing import StandardScaler
> from sklearn.feature_selection import RFE
> from sklearn import datasets
> from sklearn.cross_validation import StratifiedKFold
>
> iris = datasets.load_iris()
> X = iris.data
> y = iris.target
>
> svc = SVC(random_state=1)
>
> pipeline = Pipeline([('scl', StandardScaler()),
> ('sel', RFE(estimator=svc, step=1))])
>
> param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4],
> 'sel__estimator__C': [0.1, 1.0, 10.0, 100.0],
> 'sel__estimator__kernel': ['linear']}]
>
> grid_search = GridSearchCV(pipeline,
> param_grid=param_grid,
> verbose=1,
> cv=StratifiedKFold(y, n_folds=10),
> scoring='accuracy',
> n_jobs=1)
> grid_search.fit(X, y)
> print(grid_search.best_estimator_)
> print(grid_search.best_score_)
>
> On Apr 28, 2015, at 12:55 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
> wrote:
>
> From the documentation:
>
> "Feature selection is usually used as a pre-processing step before doing
> the actual learning. The recommended way to do this in scikit-learn is to
> use a sklearn.pipeline.Pipeline
> <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline>
> :
>
> clf = Pipeline([
> ('feature_selection', LinearSVC(penalty="l1")),
> ('classification', RandomForestClassifier())])clf.fit(X, y)
>
> In this snippet we make use of a sklearn.svm.LinearSVC
> <http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC>
> to evaluate feature importances and select the most relevant features."
>
> How many features get selected? Is that configurable?
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
>
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
>
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general