2013/1/11 Lars Buitinck <l.j.buiti...@uva.nl>:
> 2013/1/10 Andreas Mueller <amuel...@ais.uni-bonn.de>:
>> I wanted to ask: should we try to make plans? We get a lot of PRs and
>> have more and more contributors and I think it might be nice
>> if we had some form of road map to give everything a bit more direction.
>>
>> I know that people mostly contribute algorithms they are using in research,
>> and that is great, because that makes for high-quality code.
>> I am not sure, though, for how long the "hey look, I coded this cool
>> estimator which I used in my latest paper" strategy is feasible.
>
> What we could do is determine a focus for the next release other than
> "add more features", like:
> * implement Python 3 support

+10

> * fix outstanding API issues, like sparse matrix support for some estimators.

+1 for instance a sparse version of `MinMaxScaler` for instance :)

Also I would add:

- finish the current ongoing effort on ensemble methods like boosting
and I would add stacking / blending as this kind of meta estimators
might help us identify missing API patterns / requirements that are
structurally important for the project (for instance the boosting
implementation emphasized the importance of consistent handling of the
sample_weight fit parameter).

- finish the ongoing effort on model evaluation / selection tools
(mostly grid search related stuff and maybe a tool to help plot
learning curves for bias/variance analysis)

- shall we standardize the warm start pattern officially and leverage
it in model evaluation tools (e.g. to plot learning curves faster for
instance)?

- have some examples for online learning on realistic large scale data
(gael started with a new minibatch k-means example on patch data, and
I think the new hashing feature extraction for categorical / text data
will make it possible to showcase realistic sentiment analysis task
for instance. Being able to address streaming problem might help us
identify further API design issues (e.g. is the current API flexible
enough to efficiently implement streaming cross validation / early
stopping with a user supplied validation set)?

- maybe also finish experimenting and include at least one learning to
rank model such as a SVMRank implementation to ensure that we can
address this case with a consistent API (e.g. the query_id pattern).

I think we can consider that graphical model in general is a bit
off-scope for the project. I think the core team of regular
contributors / PR reviewers lack either expertise or motivation to
embark on such a large new field.

Structured prediction problems and recsys tasks would be interesting
but I am afraid we would require a team of dedicated volunteer to
invest some time to implement the base line / state of the art while
working out the best API design issues. We should probably not wait
for that to release 1.0.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to