> It has it's own scoring function.

Is this documented somewhere? I only found "Select features according to the k 
highest scores." (at 
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest)
 which could maybe a little bit extended.

>> Alternative, I would use recursive feature selection like so:
> 
> You could. It would be much slower, and I am not convinced it would work
> better.

Both terms of speed and performance, I think it depends on what SelectKBest is 
doing :). I think a greedy backward selection would have the advantage that the 
"best" features are selected with respect to the classifier performance. This 
could make a significant difference e.g., for non-linear data depending on how 
SelectKBest works.

I was conceptually thinking of something like this


pipeline = Pipeline([
                     ('scl', StandardScaler()),
                     ('sel', RFE(estimator=SVC(kernel='linear', 
random_state=1), step=1))])

param_grid = [{'sel_n_features_to_selct': [1, 2, 3, 4], 
               'sel__SVC__C': [0.1, 1, 10, 100], 
               'sel__SVC__kernel': ['linear']}]

grid_search = GridSearchCV(pipeline, 
                           param_grid=param_grid, 
                           verbose=1, 
                           cv=StratifiedKFold(y, n_folds=10), 
                           scoring='accuracy', 
                           n_jobs=1)

grid_search.fit(X, y)
print(grid_search.best_estimator_)
print(grid_search.best_score_)

Btw. is there a way to provide an argument to such a nested pipeline like that 
(e.g., here the SVC) in general?


Best,
Sebastian


> On Feb 13, 2015, at 3:34 AM, Gael Varoquaux <gael.varoqu...@normalesup.org> 
> wrote:
> 
> On Fri, Feb 13, 2015 at 03:31:54AM -0500, Sebastian Raschka wrote:
>> I am wondering how SelectKBest determines what the "best" set of
>> features is since it happens before they are fed to the classifier.
>> Does it have it's own "scoring" function or does it use the classifier
>> from the last fit?
> 
> It has it's own scoring function.
> 
>> Alternative, I would use recursive feature selection like so:
> 
> You could. It would be much slower, and I am not convinced it would work
> better. You could try both, and tell us your experience :).
> 
> Cheers,
> 
> Gaƫl
> 
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to