Indeed, I think normalization is important, depending on what you want
to show. Feel free to play with this if you have good ideas. This is
merely a quick proof of concept.

Also, I would be curious to apply and visualize our new biclustering
algorithms on this.

On 25 September 2013 16:15, Jacob Vanderplas <jake...@cs.washington.edu> wrote:
> Very cool!
> One quick comment: I'd probably normalize the values in the sparse matrix to
> 1.  As it's written, a user with, say, 1 commit on a file will be considered
> a closer neighbor to a user with 0 commits on that file than to a user with
> 3 commits on that file.
>    Jake
>
>
> On Wed, Sep 25, 2013 at 5:19 AM, Gilles Louppe <g.lou...@gmail.com> wrote:
>>
>> Hi,
>>
>> I have just put together a quick and dirty script that does that. It
>> extracts the number of commits for all developers, for all files on a
>> git directory. It then computes the 3 nearest neighbors for all
>> contributors.
>>
>> See the gist below for code and output.
>> https://gist.github.com/glouppe/6698145
>>
>> My 3 nearest neighbors are Brian, Noel and Peter. I guess it makes sense
>> :-)
>>
>> Gilles
>>
>>
>>
>> On 24 September 2013 21:30, Mathieu Blondel <math...@mblondel.org> wrote:
>> > Hi everyone,
>> >
>> > At ECML/PKDD, Lars and I were discussing the idea of using machine
>> > learning
>> > (and scikit-learn) to find out interesting things about our contributors
>> > (github indicates that we have over 180 of them so far).
>> >
>> > The idea would be to represent a contributor as a vector, the entries of
>> > which correspond to the number of times he or she modified files in the
>> > code
>> > base (binary values could work well too). This could be used to
>> > automatically find out which contributors share common interests by
>> > using
>> > clustering, bi-clustering or graphical lasso.
>> >
>> > Another idea that comes to mind is to make file recommendations to the
>> > user
>> > (files which the user is expected to have interest or expertise in but
>> > has
>> > never touched).
>> >
>> > I think that would make a nice example in the examples/applications/
>> > folder.
>> > Ideally, the example would generate the data on the fly every time the
>> > example is executed.
>> >
>> > If someone wants to play with the idea, a PR is highly welcome.
>> >
>> > Cheers,
>> > Mathieu
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > October Webinars: Code for Performance
>> > Free Intel webinars can help you accelerate application performance.
>> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> > from
>> > the latest Intel processors and coprocessors. See abstracts and register
>> > >
>> >
>> > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to