Re: [Scikit-learn-general] how to know which feature is informative or redundant in make_classification()?

Andreas Mueller Thu, 28 May 2015 09:24:15 -0700

How large is your noise and what are the other arguments to the function?

Use the source, Luke:https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/samples_generator.py

The data is generated the way Joel said.




On 05/28/2015 12:13 PM, Daniel Homola wrote:

Hi Joel,

I might be wrong, but this doesn't seem to work.. At least when Icheck the first n_informative features of X without shuffling, theydon't have any higher corrcoef than the rest of features.. I know thisisn't a definitive way of seeing which feature is relevant, but atleast it should give an indication, shouldn't it? Those features thathave extreme positive or negative corrcoef with y, are scattered alongthe columns, as if they already had been shuffled..


What do you think?

Cheers,
d

On 28/05/15 11:00, Joel Nothman wrote:

I should note however that the "informative" features already havecovariance, so their differentiation from the redundant features islikely hard. One difference is that the covariance is per-class inthe underlying features, whereas the redundant features will varyidentically (disregarding added noise in flip_y) across classes withrespect to the informative features.

On 28 May 2015 at 19:57, Joel Nothman <joel.noth...@gmail.com<mailto:joel.noth...@gmail.com>> wrote:


    As at
    
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html

        Prior to shuffling, `X` stacks a number of these primary
    "informative"
        features, "redundant" linear combinations of these,
    "repeated" duplicates
        of sampled features, and arbitrary noise for and remaining
    features.

    If you set shuffle=False, then you can extract the first
    n_informative columns as the primary informative features, etc.

    HTH

    On 28 May 2015 at 19:18, Daniel Homola
    <daniel.homol...@imperial.ac.uk
    <mailto:daniel.homol...@imperial.ac.uk>> wrote:

        Hi everyone,

        I'm benchmarking various feature selection methods, and for
        that I use
        the make_classification helper function which really great.
        However, is
        there a way to retrieve a list of the informative and
        redundant features
        after generating the fake data? It would really interesting
        to see, if
        the algorithm I'm working on is able to tell the difference
        between
        informative and redundant ones.

        Cheers,
        Daniel

        
------------------------------------------------------------------------------
        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general





------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] how to know which feature is informative or redundant in make_classification()?

Reply via email to