Hi Manoj.
Unfortunately I can not give you any advice at the moment. I am way to
swamped to take care of GSoC :-/
I think in both clustering and linear models there is a lot of room for
improvement.
For clustering there was BIRCH (that's the name, right?) that I think
Olivier wanted to implement. Maybe that would be an
interesting GSoC project? You'd have to ask Olivier and possibly Gael,
though.
Gael is also working on some more agglomerative clustering algorithms,
I'm not sure what the status is there.
What kind of linear models did you work on? I think Alex wanted to
improve the Bayesian linear models.
Sorry I can't be of more help.
Andy
On 01/30/2014 07:48 AM, Manoj Kumar wrote:
Hi Andy.
Thanks for the response :)
I'm looking into the project ideas but I'm am unable to zero in on a
single idea for GSoC . My knowledge is limited to linear and
clustering models, however I am willing to learn and read the
literature well before GSoC and I am a pretty quick learner. It would
be really nice if you or some of the other sklearn devs, suggest a
couple of more ideas (maybe 2 or 3 estimators together or improving on
existing estimators), that would help me write a successful GSoC proposal.
Thanks again,
On Thu, Jan 30, 2014 at 2:23 AM, Andy <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Hey Manoj.
I agree that the description is vague.
I think what Vlad was trying to say that refurbishing only makes
sense if it comes with long-time support by an active user.
Basically, "refurbishing" means
- have a simple and sklearn-consistent interface
- be numerically stable, reliable and repeatable
- serve all feasible major usecases
- be easy to apply to the problems that people have in practice
While you could certainly do the first, and probably the second
given some familiarity,
doing the last two is hard if you are not using the method
actively in your day-to-day data mangling.
And even if the implementation was refurbished, but you are not
around afterwards, it is not clear who will be able
to maintain it.
I don't think implementing coresets is a good idea, because it is
mostly helpful for cluster computing afaik.
Also, it adds more abstractions on top of a suboptimal interface
and implementation.
Additionally, I would really like to limit the number of
additional estimators before 1.0.
If you feel up to the task of really making this a great
implementation, and also taking care of it in the long run,
please go ahead with the proposal. But I think that might be a bit
much to ask for a GSoC.
Cheers,
Andy
ps: only my opinion ;)
On 01/18/2014 08:30 PM, Manoj Kumar wrote:
Hello,
I found this idea "Improving Gaussian Mixture Models" , repeating
in 2012 and 2013, so I assumed this to be of real interest to the
scikit-learn community. I have a fundamental knowledge of
Gaussian Models, and the EM algorithm. I would like to take this
project forward as part of GSoC. I took a quick look at the
issues tracker, and I found a number of issues.
I mailed Vlad (since his name was mentioned there as a mentor)
and this is what he had to say
"
Hey Manoj,
I just noticed I'm listed as a possible mentor for that. I think
when I
put my name there I was thinking of HMM instead of GMM, oops!
I'm guessing that the module is not really maintained and it
would be good
if somebody who is involved with GMMs actively would take it
under their wing.
I guess the point of the GSoC idea that was on there was that
somebody proposed
to do a GSoC project to implement coresets for GMM fitting (there
are two links
there). I have absolutely no experience with this method.
Of course in order to add a major new feature to a suboptimally
maintained model,
some refactoring needed to be listed as well.
Again, my feeling is that this idea came from a potential student
and it isn't a
burning need. What do you think about it?
Best,
Vlad
"
Can someone clearly explain, what the community expects out of
such a project, the project description ("Refurbish the current
GMM code to put it to the scikit's standards") in the wiki page,
seems a bit vague to me.
Thanks.
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply
import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general