Hi,
I have a raw data matrix with the number of features much larger than
number the number of examples. I then create a new data matrix, a Pearson
cross-correlation matrix, based on the rows (examples) in the raw data
matrix. Each cell (cross-correlation of a pair of examples from the raw
data) in the PCC matrix now has a corresponding label (a number, not
class). What I'd like to do is to find a subset of features from the raw
data matrix that gives the best correlation between the labels and the PCC
matrix values. Basically, I'd like to have a method that selects a subset
of columns from the raw data, then calculates PCC, and correlates that to
the labels. The method should then iterate until the maximum
cross-correlation is found. Is there a way to do so with scikit-learn?
Also, there might be better approach then using LASSO or elastic net, but
this is something I heard works great when number of features outnumbers
the number of examples.
Best,
Will
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general