I timed a bit the Entropy criterion of classification tree construction.
It appeared that the log (a transcendal function) was taking up a good
fraction of the time).

I coded a fast log approximation that is a bit brutal:
https://github.com/GaelVaroquaux/scikit-learn/commit/05e707f8dd67eb65948da877371ba62271ba94d1

it gives a factor of 4 gain in benchmarks (if you modify bench_tree to
use the entropy criterion). All the tests still pass, which shows that
this level of approximation is OK for what we are looking at.

The question is: is it acceptable to have such an approximation? I think
so, I just wanted confirmation. If people agree with this, I'll document
it better (and maybe test it) and push to master.

Gaƫl

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to