Hello scikit-learn team,

I currently work as a developer for the ilastik project (http://ilastik.org/),
and I will be starting a PhD in bioinformatics at UCSD this fall. I would
like to participate in the Google Summer of Code this year.

I have hacked on scikit-learn for my own work in the past. Here are a
couple branches I've worked on, mainly to solve a specific problem or test
an experimental feature for random forests:

   - 
empirical-forest<https://github.com/kemaleren/scikit-learn/tree/empirical-forest>:
   multiple response regression with random forests. I planned to clean this
   up and submit a pull request, but someone else beat me to it.
   - collapsed_rf<https://github.com/kemaleren/scikit-learn/tree/collapsed_rf>:
   postprocess multiple regression random forests to sum their responses. This
   was for a project where we needed to train on vector responses but only
   predict their sum.
   - fast_rfr <https://github.com/kemaleren/scikit-learn/tree/fast_rfr>: a
   random forest optimized for a regression problem in which many leaves
   returned zero arrays.
   - sse <https://github.com/kemaleren/scikit-learn/tree/sse>:
   experimenting with a different split criterion for regression trees.
   - ultrarf <https://github.com/kemaleren/scikit-learn/tree/ultrarf>: an
   attempt to speed up training by using ultrametric distance, which can be
   precomputed in linear time and queried in constant time.

Since I will be free this summer, I would like to finally contribute back
to scikit-learn. I am open to project suggestions. For instance, since I
have worked with random forests before, it might make sense to work on
supporting sparse numpy arrays.

However, I have another project that I actually already started last
year: implementing stacked generalization. It would be great to be funded
to finish this project this summer.

It's still pretty rough, but you can see what I did so far in this branch:
stacking <https://github.com/kemaleren/scikit-learn/tree/stacking>. As you
can see, there is still lots to be done, including adding other stacking
methods such as Feature-Weighted Linear Stacking, supporting various voting
schemes, etc.

This could be a very useful addition to the scikit-learn toolbox. Is there
anyone interested in mentoring this project?

Best regards,
Kemal Eren
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to