How large is your noise and what are the other arguments to the function?
Use the source, Luke:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/samples_generator.py
The data is generated the way Joel said.
On 05/28/2015 12:13 PM, Daniel Homola wrote:
Hi Joel,
I might be wrong, but this doesn't seem to work.. At least when I
check the first n_informative features of X without shuffling, they
don't have any higher corrcoef than the rest of features.. I know this
isn't a definitive way of seeing which feature is relevant, but at
least it should give an indication, shouldn't it? Those features that
have extreme positive or negative corrcoef with y, are scattered along
the columns, as if they already had been shuffled..
What do you think?
Cheers,
d
On 28/05/15 11:00, Joel Nothman wrote:
I should note however that the "informative" features already have
covariance, so their differentiation from the redundant features is
likely hard. One difference is that the covariance is per-class in
the underlying features, whereas the redundant features will vary
identically (disregarding added noise in flip_y) across classes with
respect to the informative features.
On 28 May 2015 at 19:57, Joel Nothman <joel.noth...@gmail.com
<mailto:joel.noth...@gmail.com>> wrote:
As at
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
Prior to shuffling, `X` stacks a number of these primary
"informative"
features, "redundant" linear combinations of these,
"repeated" duplicates
of sampled features, and arbitrary noise for and remaining
features.
If you set shuffle=False, then you can extract the first
n_informative columns as the primary informative features, etc.
HTH
On 28 May 2015 at 19:18, Daniel Homola
<daniel.homol...@imperial.ac.uk
<mailto:daniel.homol...@imperial.ac.uk>> wrote:
Hi everyone,
I'm benchmarking various feature selection methods, and for
that I use
the make_classification helper function which really great.
However, is
there a way to retrieve a list of the informative and
redundant features
after generating the fake data? It would really interesting
to see, if
the algorithm I'm working on is able to tell the difference
between
informative and redundant ones.
Cheers,
Daniel
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general