On Thu, Jan 30, 2014 at 12:18:20PM +0530, Manoj Kumar wrote: > I'm looking into the project ideas but I'm am unable to zero in on a single > idea for GSoC . My knowledge is limited to linear and clustering models, > however I am willing to learn and read the literature well before GSoC and I > am > a pretty quick learner. It would be really nice if you or some of the other > sklearn devs, suggest a couple of more ideas (maybe 2 or 3 estimators together > or improving on existing estimators), that would help me write a successful > GSoC proposal.
I think that linear models could use a lot of improvements in scikit-learn. Amongst other things, the 'strong rules' could and should be included in scikit-learn. Moreover, our coordinnate descent code can probably be made more optimal by choosing the coordinates it optimizes in better ways. I would personnally be excited by a GSOC that makes linear models faster, and I know that there is a lot of room for improvements. Specifically, for linear models, I think that the following are improvements that I would like to see (in increasing order of difficulty): - Better strategy to choose coordinates in the coordinate descent. Chances are the simply randomizing the choice would be better than the linear traversal that we are doing. My personnal bias would be to benchmark the performances on "wide data": many features, not many samples, as in bioinformatics; and I would be particularly interested in neuroimaging data. - Strong rules - Integrating part of Mathieu Blondel's lightning. Mathieu would need to be heavily implicated here. However, we have already had a student on a similar project (specifically the strong rules) and the strong rules never got merged in because the maths were too hard for the student, and he wasn't able to implement them cleanly. Manoj, you are motivated, and you have been producing code. I like that aspect a lot. I think that you should try and play with the parts of the codebase that you might be interested in modifying in a GSOC to see if you master them well enough to propose an enhancement. The more you try to understand and improve the current codebase, the more you read the corresponding literature, the more convincing you will sound. Part of preparing the GSOC is making sure that you can specify a project were you know where to go and you feel you can be successful. If I feel that you are up to the task, I could be motivated by mentoring you in a GSOC on linear models. Cheers, Gaƫl ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general