2011/12/3 Denis Kochedykov <[email protected]>:
> Hi all,
>
> I'm looking for an ML library for Python for our research team. I found
> a quite comprehensive one - Orange - and a relatively new one -
> scikits.learn.
> Orange definitely look good given the number of methods implemented in
> it, maturity and its GUI as a bonus.
> But I'm a bit confused - if you guys started a new library, maybe there
> is something wrong with Orange? Why do you need to re-implement what has
> been already done, instead of using that lib as a foundation and
> concentrate on adding a new cool stuff or improving existing?

Hi Denis,

I my opinion here are the main reasons why scikit-learn cannot reuse orange:

- scikit-learn is a scikit (scientific python toolkit): it is meant to
be used by he scipy community and to play by its tacit rules: the
primary data structure is plain old numpy array (or
scipy.sparse.matrix): no machine learning specific class for samples,
features, datasets...

- scikit-learn has only dependencies on non viral open source licenses
(python, numpy, scipy and joblib all are BSD-like): hence scikit-learn
is BSD-like as well to play fair in this permissive ecosystem (being a
able to copy and paste any function or modules of scikit-learn source
code anywhere else is perfectly OK)

- scikit-learn focuses on implementing machine learning with as few
framework code as possible and let other framework oriented projects
reuse some of scikit-learn modules if they want to do so: i.e. to
build datamining GUI for instance.

Other scikit-learn contributors might have their own reasons to
contribute to scikit-learn rather than Orange.

Also on a more trivial perspective, I like working on github using
pull-request based reviews as the main inter-developer communication
medium for code contributions. svn is such a pain once you tasted a
decentralized tool like git or hg.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to