Hi there,

I agree with you on having long-term goals. We should indeed define
where we want the library to go.

Before going into such introspection though, I think we should get
more insight about how our current user base is using Scikit-Learn. I
have been involved in the project for more than one year and a half
and I have to say that I unfortunately don't know well our users
(beside the unhappy ones who report bugs).

- How many users do we actually have? a few dozens? hundreds? thousands?
- What are they actually using the library for?
- How are they doing that?
- Is that what we want and how we want them to do?
- ... does this match with our vision for the project?

Those are some questions for which I would be curious to get answers.

I got some insight this year when I made my students use Scikit-Learn
for assignments in our local Machine Learning course. They started
from zero knowledge in Python and in Machine Learning, and they end up
tackling a real world problem (I made them compete locally on the
Impermium dataset, trying to detect insults in social commentary).
Once the assignments were done, I asked them to give me some feedback
about Scikit-Learn to see what they like and dislike. I was planning
on sending you email to give you that feedback, but here is the
opportunity. So here it is:

+ They quickly got acquainted with the library. It was easy and
straightforward for most of them. (I actually received a lot less
questions in comparison with when were using Matlab.)
+ They found the documentation very well structured and very helpful.
+ They were glad to find nearly all the algorithms we study in class
(both not all though).
+ They liked the well-structured and common API between the
estimators. This indeed made the library a lot easier to use and
learn.
- Some had hard time to understand the error messages. I indeed agree
that some error messages may look cryptic for novices.
- Not all estimators are sparse-compatible.
- Some basic estimators are missing. They complained about the lack of
neural networks. Some generic ensemble methods are also missing
(Stacking and Bagging are two easy but very useful ensemble methods
that we should have in my opinion).
- Some try to implement their own estimators, but they failed to make
it compatible with our grid-search module. (I think what is missing
here is some documentation regarding what is the expected interface.)

Overall this feedback is very positive. If our goal is to build a
toolbox such that non-experts can quickly get theirs hands on machine
learning and have results quite easily, then I think we are not that
far from that! However, this also highlights some important lacking
features in the library that I think we should fix before going for
1.0.

What is your opinion on this?

Cheers,

Gilles

On 11 January 2013 01:19, Olivier Grisel <olivier.gri...@ensta.org> wrote:
> 2013/1/11 Vlad Niculae <zephy...@gmail.com>:
>> I completely agree with everyone regarding 1.0 and I really think we should
>> make a clear list of issues for this (just saying API is pretty vague).
>> However there is life after the 1.0, and I think Andy's message was more
>> about that kind of long-term decisions.
>
> Agreed.
>
>> We should avoid getting features that aren't used by users, and equally well
>> features that aren't of interest to active developers. I don't feel like
>> scikit-learn is at risk at the moment, but we must avoid ending up with
>> (more?) semi-orphaned modules that most developers are afraid to touch in
>> case an issue is reported.
>
> Very true.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
> ------------------------------------------------------------------------------
> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
> much more. Get web development skills now with LearnDevNow -
> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
> SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122812
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to