How large is your noise and what are the other arguments to the function?
Use the source, Luke: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/samples_generator.py
The data is generated the way Joel said.



On 05/28/2015 12:13 PM, Daniel Homola wrote:
Hi Joel,

I might be wrong, but this doesn't seem to work.. At least when I check the first n_informative features of X without shuffling, they don't have any higher corrcoef than the rest of features.. I know this isn't a definitive way of seeing which feature is relevant, but at least it should give an indication, shouldn't it? Those features that have extreme positive or negative corrcoef with y, are scattered along the columns, as if they already had been shuffled..

What do you think?

Cheers,
d

On 28/05/15 11:00, Joel Nothman wrote:
I should note however that the "informative" features already have covariance, so their differentiation from the redundant features is likely hard. One difference is that the covariance is per-class in the underlying features, whereas the redundant features will vary identically (disregarding added noise in flip_y) across classes with respect to the informative features.

On 28 May 2015 at 19:57, Joel Nothman <joel.noth...@gmail.com <mailto:joel.noth...@gmail.com>> wrote:

    As at
    
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html

        Prior to shuffling, `X` stacks a number of these primary
    "informative"
        features, "redundant" linear combinations of these,
    "repeated" duplicates
        of sampled features, and arbitrary noise for and remaining
    features.

    If you set shuffle=False, then you can extract the first
    n_informative columns as the primary informative features, etc.

    HTH

    On 28 May 2015 at 19:18, Daniel Homola
    <daniel.homol...@imperial.ac.uk
    <mailto:daniel.homol...@imperial.ac.uk>> wrote:

        Hi everyone,

        I'm benchmarking various feature selection methods, and for
        that I use
        the make_classification helper function which really great.
        However, is
        there a way to retrieve a list of the informative and
        redundant features
        after generating the fake data? It would really interesting
        to see, if
        the algorithm I'm working on is able to tell the difference
        between
        informative and redundant ones.

        Cheers,
        Daniel

        
------------------------------------------------------------------------------
        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general





------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to