Hi Luca, On 6 March 2015 at 11:09, Luca Puggini <lucapug...@gmail.com> wrote: > Hi, > It seems to me that you are discussing topics that can be introduced in > sklearn with GSoC. > > I use sklearn quiet a lot and there are a couple of things that I really > miss in this library: > > 1- Nipals PCA. > The current version of PCA is too low for high dimensional dataset. Suppose > to have p=10000 variables and be interested in only the first 10 principal > components. In a situation like this nipals PCA is much more efficient. > Also other algorithms like PLS can increase their computational performance > with nipals PCA > > 2- Something to rank the variables > At the moment it seems to me that the only way to rank the variables is the > Random Forest importance. This method is known to be very very biased. I > suggest something like the method implemented in the R library party.
Just commenting on this: The bias is only dependent on how you construct the forest. If you build a forest of totally randomized trees and limit their depth (e.g., ExtraTreesClassifier(max_features=1, max_depth=5), then you will fix for most of the biases in the resulting importances. Gilles > > > I hope that these comments can help. > I may decide to apply for GSoC as well :-) > > Best, > Luca > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general