I should note however that the "informative" features already have
covariance, so their differentiation from the redundant features is likely
hard. One difference is that the covariance is per-class in the underlying
features, whereas the redundant features will vary identically
(disregarding added noise in flip_y) across classes with respect to the
informative features.

On 28 May 2015 at 19:57, Joel Nothman <joel.noth...@gmail.com> wrote:

> As at
> http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
>
>     Prior to shuffling, `X` stacks a number of these primary "informative"
>     features, "redundant" linear combinations of these, "repeated"
> duplicates
>     of sampled features, and arbitrary noise for and remaining features.
>
> If you set shuffle=False, then you can extract the first n_informative
> columns as the primary informative features, etc.
>
> HTH
>
> On 28 May 2015 at 19:18, Daniel Homola <daniel.homol...@imperial.ac.uk>
> wrote:
>
>> Hi everyone,
>>
>> I'm benchmarking various feature selection methods, and for that I use
>> the make_classification helper function which really great. However, is
>> there a way to retrieve a list of the informative and redundant features
>> after generating the fake data? It would really interesting to see, if
>> the algorithm I'm working on is able to tell the difference between
>> informative and redundant ones.
>>
>> Cheers,
>> Daniel
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to