Re: [Scikit-learn-general] motivation for the lib, why re-implement existing stuff

David Warde-Farley Sun, 04 Dec 2011 05:42:34 -0800

On Sun, Dec 04, 2011 at 09:16:56PM +0800, Denis Kochedykov wrote:
> 
> Hi David,
> 
> Thanks, very good points. That is
> 
> 1. C++ rather than Python (in fact this, looks like a plus for me - 
> performance, universality, etc)


I agree from the perspective of universality, but beware of the trap of
making speed generalizations about languages. A lot of the speed-critical
parts of sklearn are quite heavily optimized in Cython. I recall that their
coordinate descent (for generalized linear models) implementation compares
quite favourably against a widely used and cleverly written Fortran
implementation. Sounds like Brian has found the decision tree implementation
to be quite speedy as well.

Suffice it to say, it's possible to write quite fast Python code (and in my
experience, almost always possible to achieve C-like speeds with a dash of
Cython), and it's also possible to really drop the ball and write very slow
C/C++ code.

David

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] motivation for the lib, why re-implement existing stuff

Reply via email to