Hey Manoj.
I agree that the description is vague.
I think what Vlad was trying to say that refurbishing only makes sense if it comes with long-time support by an active user.

Basically, "refurbishing" means
- have a simple and sklearn-consistent interface
- be numerically stable, reliable and repeatable
- serve all feasible major usecases
- be easy to apply to the problems that people have in practice

While you could certainly do the first, and probably the second given some familiarity, doing the last two is hard if you are not using the method actively in your day-to-day data mangling. And even if the implementation was refurbished, but you are not around afterwards, it is not clear who will be able
to maintain it.

I don't think implementing coresets is a good idea, because it is mostly helpful for cluster computing afaik. Also, it adds more abstractions on top of a suboptimal interface and implementation. Additionally, I would really like to limit the number of additional estimators before 1.0.

If you feel up to the task of really making this a great implementation, and also taking care of it in the long run, please go ahead with the proposal. But I think that might be a bit much to ask for a GSoC.

Cheers,
Andy

ps: only my opinion ;)


On 01/18/2014 08:30 PM, Manoj Kumar wrote:
Hello,

I found this idea "Improving Gaussian Mixture Models" , repeating in 2012 and 2013, so I assumed this to be of real interest to the scikit-learn community. I have a fundamental knowledge of Gaussian Models, and the EM algorithm. I would like to take this project forward as part of GSoC. I took a quick look at the issues tracker, and I found a number of issues.

I mailed Vlad (since his name was mentioned there as a mentor) and this is what he had to say

"
Hey Manoj,

I just noticed I'm listed as a possible mentor for that.  I think when I
put my name there I was thinking of HMM instead of GMM, oops!

I'm guessing that the module is not really maintained and it would be good
if somebody who is involved with GMMs actively would take it under their wing.

I guess the point of the GSoC idea that was on there was that somebody proposed to do a GSoC project to implement coresets for GMM fitting (there are two links
there). I have absolutely no experience with this method.
Of course in order to add a major new feature to a suboptimally maintained model,
some refactoring needed to be listed as well.

Again, my feeling is that this idea came from a potential student and it isn't a
burning need.  What do you think about it?

Best,
Vlad
"
Can someone clearly explain, what the community expects out of such a project, the project description ("Refurbish the current GMM code to put it to the scikit's standards") in the wiki page, seems a bit vague to me.

Thanks.
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to