Hi,
It seems to me that you are discussing topics that can be introduced in
sklearn with GSoC.

I use sklearn quiet a lot and there are a couple of things that I really
miss in this library:

1- Nipals PCA.
The current version of PCA is too low for high dimensional dataset.
Suppose to have p=10000 variables and be interested in only the first 10
principal components. In a situation like this nipals PCA is much more
efficient.  Also other algorithms like PLS can increase their computational
performance with nipals PCA

2- Something to rank the variables
At the moment it seems to me that the only way to rank the variables is the
Random Forest importance. This method is known to be very very biased. I
suggest something like the method implemented in the R library party.


I hope that these comments can help.
I may decide to apply for GSoC as well :-)

Best,
Luca
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to