Le 4 avril 2012 20:19, Alexandre Gramfort
<alexandre.gramf...@inria.fr> a écrit :
> hello vlad,
>
> hope you're doing better.
>
> My gut feeling reading the proposal is that you clearly know what you're 
> talking
> about as you know well the code base but I think you should be more specific
> about where the low hanging fruits are and which modules deserve some love
> in terms of speed.

Maybe you could state explicitly that the work will include a
scalability profile of all the available models:

Pickup a selection of ~5 differents datasets with very different
n_samples, n_features and sparsity profiles and compile a list of all
the estimators that are able to converge to a useable model in less
than 1s, 10s, 100s or 1000s for instance and less than 1GB memory for
instance.

This kind of high level information would a be really nice complement
to the table in [1] for instance.

[1] http://scikit-learn.org/dev/modules/clustering.html

While doing so, you could using the cProfile / line_profiler modules
to help identify low hanging fruits.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to