Hi Manoj.
Unfortunately I can not give you any advice at the moment. I am way to swamped to take care of GSoC :-/ I think in both clustering and linear models there is a lot of room for improvement. For clustering there was BIRCH (that's the name, right?) that I think Olivier wanted to implement. Maybe that would be an interesting GSoC project? You'd have to ask Olivier and possibly Gael, though. Gael is also working on some more agglomerative clustering algorithms, I'm not sure what the status is there.

What kind of linear models did you work on? I think Alex wanted to improve the Bayesian linear models.

Sorry I can't be of more help.

Andy


On 01/30/2014 07:48 AM, Manoj Kumar wrote:
Hi Andy.

Thanks for the response :)

I'm looking into the project ideas but I'm am unable to zero in on a single idea for GSoC . My knowledge is limited to linear and clustering models, however I am willing to learn and read the literature well before GSoC and I am a pretty quick learner. It would be really nice if you or some of the other sklearn devs, suggest a couple of more ideas (maybe 2 or 3 estimators together or improving on existing estimators), that would help me write a successful GSoC proposal.

Thanks again,


On Thu, Jan 30, 2014 at 2:23 AM, Andy <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    Hey Manoj.
    I agree that the description is vague.
    I think what Vlad was trying to say that refurbishing only makes
    sense if it comes with long-time support by an active user.

    Basically, "refurbishing" means
    - have a simple and sklearn-consistent interface
    - be numerically stable, reliable and repeatable
    - serve all feasible major usecases
    - be easy to apply to the problems that people have in practice

    While you could certainly do the first, and probably the second
    given some familiarity,
    doing the last two is hard if you are not using the method
    actively in your day-to-day data mangling.
    And even if the implementation was refurbished, but you are not
    around afterwards, it is not clear who will be able
    to maintain it.

    I don't think implementing coresets is a good idea, because it is
    mostly helpful for cluster computing afaik.
    Also, it adds more abstractions on top of a suboptimal interface
    and implementation.
    Additionally, I would really like to limit the number of
    additional estimators before 1.0.

    If you feel up to the task of really making this a great
    implementation, and also taking care of it in the long run,
    please go ahead with the proposal. But I think that might be a bit
    much to ask for a GSoC.

    Cheers,
    Andy

    ps: only my opinion ;)



    On 01/18/2014 08:30 PM, Manoj Kumar wrote:
    Hello,

    I found this idea "Improving Gaussian Mixture Models" , repeating
    in 2012 and 2013, so I assumed this to be of real interest to the
    scikit-learn community. I have a fundamental knowledge of
    Gaussian Models, and the EM algorithm. I would like to take this
    project forward as part of GSoC. I took a quick look at the
    issues tracker, and I found a number of issues.

    I mailed Vlad (since his name was mentioned there as a mentor)
    and this is what he had to say

    "
    Hey Manoj,

    I just noticed I'm listed as a possible mentor for that.  I think
    when I
    put my name there I was thinking of HMM instead of GMM, oops!

    I'm guessing that the module is not really maintained and it
    would be good
    if somebody who is involved with GMMs actively would take it
    under their wing.

    I guess the point of the GSoC idea that was on there was that
    somebody proposed
    to do a GSoC project to implement coresets for GMM fitting (there
    are two links
    there). I have absolutely no experience with this method.
    Of course in order to add a major new feature to a suboptimally
    maintained model,
    some refactoring needed to be listed as well.

    Again, my feeling is that this idea came from a potential student
    and it isn't a
    burning need.  What do you think about it?

    Best,
    Vlad
    "
    Can someone clearly explain, what the community expects out of
    such a project, the project description ("Refurbish the current
    GMM code to put it to the scikit's standards") in the wiki page,
    seems a bit vague to me.

    Thanks.
-- Regards,
    Manoj Kumar,
    Mech Undergrad
    http://manojbits.wordpress.com


    
------------------------------------------------------------------------------
    CenturyLink Cloud: The Leader in Enterprise Cloud Services.
    Learn Why More Businesses Are Choosing CenturyLink Cloud For
    Critical Workloads, Development Environments & Everything In Between.
    Get a Quote or Start a Free Trial Today.
    http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


    
------------------------------------------------------------------------------
    WatchGuard Dimension instantly turns raw network data into actionable
    security intelligence. It gives you real-time visual feedback on key
    security issues and trends.  Skip the complicated setup - simply
    import
    a virtual appliance and go from zero to informed in seconds.
    http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com


------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to