Maybe we would want mrmr first? http://penglab.janelia.org/proj/mRMR/
On 04/27/2015 06:46 PM, Sebastian Raschka wrote: >> I guess that could be done, but has a much higher complexity than RFE. > Oh yes, I agree, the sequential feature algorithms are definitely > computationally more costly. > >> It seems interesting. Is that really used in practice and is there any >> literature evaluating it? > > I am not sure how often it is used in practice nowadays, but I think it is > one of the classic approaches for feature selection -- I learned about it a > couple of years ago in a pattern classification class, and there is a > relatively detailed article in > > Ferri, F., et al. "Comparative study of techniques for large-scale feature > selection." Pattern Recognition in Practice IV (1994): 403-413. > > The optimal solution to feature selection would be to evaluate the > performance of all possible feature combination, which is a little bit too > costly in practice. The sequential forward or backward selection (SFS and > SBS) algorithms are just a suboptimal solution, and there are some minor > improvements, e.g,. Sequential Floating Forward Selection (SFFS) which allows > for the removal of added features in later stages etc. > > I have an implementation of SBS that uses k-fold cross_val_score, and it is > actually not a bad idea to use it for KNN to reduce overfitting as > alternative to dimensionality reduction, for example, KNN cross-val mean > accuracy on the wine dataset where the features are selected by SBS: > http://i.imgur.com/ywDTHom.png?1 > > But for scikit-learn, it may be better to implement SBBS or SFFS which is > slightly more sophisticated. > > >> On Apr 27, 2015, at 6:00 PM, Andreas Mueller <t3k...@gmail.com> wrote: >> >> That is like a one-step look-ahead feature selection? >> I guess that could be done, but has a much higher complexity than RFE. >> RFE works for anything that returns "importances", not just linear models. >> It doesn't really work for KNN, as you say. [I wouldn't say >> non-parametric models. Trees are pretty non-parametric]. >> >> It seems interesting. Is that really used in practice and is there any >> literature evaluating it? >> There is some discussion here >> http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf in 4.2 >> but there is no empirical comparison or theoretical analysis. >> >> To be added to sklearn, you'd need to show that it is widely used and / >> or widely useful. >> >> >> On 04/27/2015 02:47 PM, Sebastian Raschka wrote: >>> Hi, I was wondering if sequential feature selection algorithms are >>> currently implemented in scikit-learn. The closest that I could find was >>> recursive feature elimination (RFE); >>> http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html. >>> However, unless the application requires a fixed number of features, I am >>> not sure if it is necessarily worthwhile using it over regularized models. >>> When I understand correctly, it works like this: >>> >>> {x1, x2, x3} --> eliminate xi with smallest corresponding weight >>> >>> {x1, x3} --> eliminate xi with smallest corresponding weight >>> >>> {x1} >>> >>> However, this would only work with linear, discriminative models right? >>> >>> Wouldn't be a classic "sequential feature selection" algorithm useful for >>> non-regularized, nonparametric models e.g,. K-nearest neighbors as an >>> alternative to dimensionality reduction for applications where the original >>> features may need to be maintained? The RFE, for example, wouldn't work >>> with KNN, and maybe the data is non-linearly separable so that RFE with a >>> linear model doesn't make sense. >>> >>> In a nutshell, SFS algorithms simply add or remove one feature at the time >>> based on the classifier performance. >>> >>> e.g., Sequential backward selection: >>> >>> {x1, x2, x3} ---> estimate performance on {x1, x2}, {x2, x3} and {x1, x3}, >>> and pick the subset with the best performance >>> {x1, x3} ---> estimate performance on {x1}, {x3} and pick the subset with >>> the best performance >>> {x1} >>> >>> where performance could be e.g., cross-val accuracy. >>> >>> What do you think? >>> >>> Best, >>> Sebastian >>> ------------------------------------------------------------------------------ >>> One dashboard for servers and applications across Physical-Virtual-Cloud >>> Widest out-of-the-box monitoring support with 50+ applications >>> Performance metrics, stats and reports that give you Actionable Insights >>> Deep dive visibility with transaction tracing using APM Insight. >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general