Thanks. Completely forgot to follow-up on this. The tip by Michael worked perfectly:
> Forget this comment, it actually works, because RFE itself also does the '__' > thing. You need to use 'sel__estimator__C' instead of 'sel__SVC__C' Just ran a quick test on a simple toy dataset (Wine from UCI). Here, I grid-searched for SVM parameters and 1-4 features via SelectKBest vs. RFE. Both yielded different feature subsets (both selected 4 features though). SelectKBest -> avg. ROC accuracy: 0.94 RFE workaround -> avg. ROC accuracy: 0.96 Maybe it would be worthwhile adding an example to the doc for how to do recursive feature elimination in GridSearch? I have to mention that wine is probably not the best dataset, since all features are somewhat informative and the best selected feature subset would be k = d so that running a gridsearch for the number of features would probably always yield the highest number of features. Btw. the code would be ### RFE approach pipeline = Pipeline([('scl', StandardScaler()), ('sel', RFE(estimator=SVC(kernel='linear', random_state=1), step=1))]) param_grid = [{'sel__n_features_to_select': [1, 2, 3, 4], 'sel__estimator__C': [0.1, 1.0, 10.0, 100.0], 'sel__estimator__kernel': ['linear']}] grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=1, cv=StratifiedKFold(y, n_folds=10), scoring='accuracy', n_jobs=1) grid_search.fit(X, y) print(grid_search.best_estimator_) print(grid_search.best_score_) ### SelectKBest approach pipeline = Pipeline([('scl', StandardScaler()), ('sel', SelectKBest()), ('clf', SVC(kernel='linear', random_state=1))]) param_grid = [{'sel__k': [1, 2, 3, 4], 'clf__C': [0.1, 1, 10, 100], 'clf__kernel': ['linear']}] grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=1, cv=StratifiedKFold(y, n_folds=10), scoring='accuracy', n_jobs=1) grid_search.fit(X, y) print(grid_search.best_estimator_) print(grid_search.best_score_) Best, Sebastian > On Feb 17, 2015, at 6:54 PM, Andy <t3k...@gmail.com> wrote: > > > On 02/13/2015 01:04 AM, Sebastian Raschka wrote: >> >> Both terms of speed and performance, I think it depends on what SelectKBest >> is doing :). > It uses a simple anova test, which is independent for each feature. It > does not build any kind of model at all, so it is very cheap. > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general