Le 4 avril 2012 20:19, Alexandre Gramfort <alexandre.gramf...@inria.fr> a écrit : > hello vlad, > > hope you're doing better. > > My gut feeling reading the proposal is that you clearly know what you're > talking > about as you know well the code base but I think you should be more specific > about where the low hanging fruits are and which modules deserve some love > in terms of speed.
Maybe you could state explicitly that the work will include a scalability profile of all the available models: Pickup a selection of ~5 differents datasets with very different n_samples, n_features and sparsity profiles and compile a list of all the estimators that are able to converge to a useable model in less than 1s, 10s, 100s or 1000s for instance and less than 1GB memory for instance. This kind of high level information would a be really nice complement to the table in [1] for instance. [1] http://scikit-learn.org/dev/modules/clustering.html While doing so, you could using the cProfile / line_profiler modules to help identify low hanging fruits. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general