Hi Luca,

On 6 March 2015 at 11:09, Luca Puggini <lucapug...@gmail.com> wrote:
> Hi,
> It seems to me that you are discussing topics that can be introduced in
> sklearn with GSoC.
>
> I use sklearn quiet a lot and there are a couple of things that I really
> miss in this library:
>
> 1- Nipals PCA.
> The current version of PCA is too low for high dimensional dataset.  Suppose
> to have p=10000 variables and be interested in only the first 10 principal
> components. In a situation like this nipals PCA is much more efficient.
> Also other algorithms like PLS can increase their computational performance
> with nipals PCA
>
> 2- Something to rank the variables
> At the moment it seems to me that the only way to rank the variables is the
> Random Forest importance. This method is known to be very very biased. I
> suggest something like the method implemented in the R library party.

Just commenting on this: The bias is only dependent on how you
construct the forest. If you build a forest of totally randomized
trees and limit their depth (e.g.,
ExtraTreesClassifier(max_features=1, max_depth=5), then you will fix
for most of the biases in the resulting importances.

Gilles

>
>
> I hope that these comments can help.
> I may decide to apply for GSoC as well :-)
>
> Best,
> Luca
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to