Hi everyone,

At ECML/PKDD, Lars and I were discussing the idea of using machine learning
(and scikit-learn) to find out interesting things about our contributors
(github indicates that we have over 180 of them so far).

The idea would be to represent a contributor as a vector, the entries of
which correspond to the number of times he or she modified files in the
code base (binary values could work well too). This could be used to
automatically find out which contributors share common interests by using
clustering, bi-clustering or graphical lasso.

Another idea that comes to mind is to make file recommendations to the user
(files which the user is expected to have interest or expertise in but has
never touched).

I think that would make a nice example in the examples/applications/
folder. Ideally, the example would generate the data on the fly every time
the example is executed.

If someone wants to play with the idea, a PR is highly welcome.

Cheers,
Mathieu
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to