> It has it's own scoring function. Is this documented somewhere? I only found "Select features according to the k highest scores." (at http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest) which could maybe a little bit extended.
>> Alternative, I would use recursive feature selection like so: > > You could. It would be much slower, and I am not convinced it would work > better. Both terms of speed and performance, I think it depends on what SelectKBest is doing :). I think a greedy backward selection would have the advantage that the "best" features are selected with respect to the classifier performance. This could make a significant difference e.g., for non-linear data depending on how SelectKBest works. I was conceptually thinking of something like this pipeline = Pipeline([ ('scl', StandardScaler()), ('sel', RFE(estimator=SVC(kernel='linear', random_state=1), step=1))]) param_grid = [{'sel_n_features_to_selct': [1, 2, 3, 4], 'sel__SVC__C': [0.1, 1, 10, 100], 'sel__SVC__kernel': ['linear']}] grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=1, cv=StratifiedKFold(y, n_folds=10), scoring='accuracy', n_jobs=1) grid_search.fit(X, y) print(grid_search.best_estimator_) print(grid_search.best_score_) Btw. is there a way to provide an argument to such a nested pipeline like that (e.g., here the SVC) in general? Best, Sebastian > On Feb 13, 2015, at 3:34 AM, Gael Varoquaux <gael.varoqu...@normalesup.org> > wrote: > > On Fri, Feb 13, 2015 at 03:31:54AM -0500, Sebastian Raschka wrote: >> I am wondering how SelectKBest determines what the "best" set of >> features is since it happens before they are fed to the classifier. >> Does it have it's own "scoring" function or does it use the classifier >> from the last fit? > > It has it's own scoring function. > >> Alternative, I would use recursive feature selection like so: > > You could. It would be much slower, and I am not convinced it would work > better. You could try both, and tell us your experience :). > > Cheers, > > Gaƫl > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general