That is like a one-step look-ahead feature selection? I guess that could be done, but has a much higher complexity than RFE. RFE works for anything that returns "importances", not just linear models. It doesn't really work for KNN, as you say. [I wouldn't say non-parametric models. Trees are pretty non-parametric].
It seems interesting. Is that really used in practice and is there any literature evaluating it? There is some discussion here http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf in 4.2 but there is no empirical comparison or theoretical analysis. To be added to sklearn, you'd need to show that it is widely used and / or widely useful. On 04/27/2015 02:47 PM, Sebastian Raschka wrote: > Hi, I was wondering if sequential feature selection algorithms are currently > implemented in scikit-learn. The closest that I could find was recursive > feature elimination (RFE); > http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html. > However, unless the application requires a fixed number of features, I am > not sure if it is necessarily worthwhile using it over regularized models. > When I understand correctly, it works like this: > > {x1, x2, x3} --> eliminate xi with smallest corresponding weight > > {x1, x3} --> eliminate xi with smallest corresponding weight > > {x1} > > However, this would only work with linear, discriminative models right? > > Wouldn't be a classic "sequential feature selection" algorithm useful for > non-regularized, nonparametric models e.g,. K-nearest neighbors as an > alternative to dimensionality reduction for applications where the original > features may need to be maintained? The RFE, for example, wouldn't work with > KNN, and maybe the data is non-linearly separable so that RFE with a linear > model doesn't make sense. > > In a nutshell, SFS algorithms simply add or remove one feature at the time > based on the classifier performance. > > e.g., Sequential backward selection: > > {x1, x2, x3} ---> estimate performance on {x1, x2}, {x2, x3} and {x1, x3}, > and pick the subset with the best performance > {x1, x3} ---> estimate performance on {x1}, {x3} and pick the subset with the > best performance > {x1} > > where performance could be e.g., cross-val accuracy. > > What do you think? > > Best, > Sebastian > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general