2013/1/11 Lars Buitinck <l.j.buiti...@uva.nl>: > 2013/1/10 Andreas Mueller <amuel...@ais.uni-bonn.de>: >> I wanted to ask: should we try to make plans? We get a lot of PRs and >> have more and more contributors and I think it might be nice >> if we had some form of road map to give everything a bit more direction. >> >> I know that people mostly contribute algorithms they are using in research, >> and that is great, because that makes for high-quality code. >> I am not sure, though, for how long the "hey look, I coded this cool >> estimator which I used in my latest paper" strategy is feasible. > > What we could do is determine a focus for the next release other than > "add more features", like: > * implement Python 3 support
+10 > * fix outstanding API issues, like sparse matrix support for some estimators. +1 for instance a sparse version of `MinMaxScaler` for instance :) Also I would add: - finish the current ongoing effort on ensemble methods like boosting and I would add stacking / blending as this kind of meta estimators might help us identify missing API patterns / requirements that are structurally important for the project (for instance the boosting implementation emphasized the importance of consistent handling of the sample_weight fit parameter). - finish the ongoing effort on model evaluation / selection tools (mostly grid search related stuff and maybe a tool to help plot learning curves for bias/variance analysis) - shall we standardize the warm start pattern officially and leverage it in model evaluation tools (e.g. to plot learning curves faster for instance)? - have some examples for online learning on realistic large scale data (gael started with a new minibatch k-means example on patch data, and I think the new hashing feature extraction for categorical / text data will make it possible to showcase realistic sentiment analysis task for instance. Being able to address streaming problem might help us identify further API design issues (e.g. is the current API flexible enough to efficiently implement streaming cross validation / early stopping with a user supplied validation set)? - maybe also finish experimenting and include at least one learning to rank model such as a SVMRank implementation to ensure that we can address this case with a consistent API (e.g. the query_id pattern). I think we can consider that graphical model in general is a bit off-scope for the project. I think the core team of regular contributors / PR reviewers lack either expertise or motivation to embark on such a large new field. Structured prediction problems and recsys tasks would be interesting but I am afraid we would require a team of dedicated volunteer to invest some time to implement the base line / state of the art while working out the best API design issues. We should probably not wait for that to release 1.0. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general