I should note however that the "informative" features already have covariance, so their differentiation from the redundant features is likely hard. One difference is that the covariance is per-class in the underlying features, whereas the redundant features will vary identically (disregarding added noise in flip_y) across classes with respect to the informative features.
On 28 May 2015 at 19:57, Joel Nothman <joel.noth...@gmail.com> wrote: > As at > http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html > > Prior to shuffling, `X` stacks a number of these primary "informative" > features, "redundant" linear combinations of these, "repeated" > duplicates > of sampled features, and arbitrary noise for and remaining features. > > If you set shuffle=False, then you can extract the first n_informative > columns as the primary informative features, etc. > > HTH > > On 28 May 2015 at 19:18, Daniel Homola <daniel.homol...@imperial.ac.uk> > wrote: > >> Hi everyone, >> >> I'm benchmarking various feature selection methods, and for that I use >> the make_classification helper function which really great. However, is >> there a way to retrieve a list of the informative and redundant features >> after generating the fake data? It would really interesting to see, if >> the algorithm I'm working on is able to tell the difference between >> informative and redundant ones. >> >> Cheers, >> Daniel >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general