Hi everyone I have updated my proposal thanks to your excellent suggestions.
I also pointed out the style of optimization that will be applied by linking to my blog post on optimizing orthogonal matching pursuit code. Unfortunately this will also flash the bug I introduced before everyone's eyes. I hope it doesn't look so bad… does it? :) I have yet to point out obvious low-hanging fruits. What do you suggest? Do you think the proposal makes it clear enough that it's not just about making stuff run faster, but also setting up a benchmarking system and making sure things stay fast and that new code will be easily benchable? I plan to submit tonight. Regards, Vlad On Apr 4, 2012, at 21:32 , Olivier Grisel wrote: > Le 4 avril 2012 20:19, Alexandre Gramfort > <alexandre.gramf...@inria.fr> a écrit : >> hello vlad, >> >> hope you're doing better. >> >> My gut feeling reading the proposal is that you clearly know what you're >> talking >> about as you know well the code base but I think you should be more specific >> about where the low hanging fruits are and which modules deserve some love >> in terms of speed. > > Maybe you could state explicitly that the work will include a > scalability profile of all the available models: > > Pickup a selection of ~5 differents datasets with very different > n_samples, n_features and sparsity profiles and compile a list of all > the estimators that are able to converge to a useable model in less > than 1s, 10s, 100s or 1000s for instance and less than 1GB memory for > instance. > > This kind of high level information would a be really nice complement > to the table in [1] for instance. > > [1] http://scikit-learn.org/dev/modules/clustering.html > > While doing so, you could using the cProfile / line_profiler modules > to help identify low hanging fruits. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general