Hi Allessandro,
you might want to look into this presentation by Olivier
https://speakerdeck.com/ogrisel/growing-randomized-trees-in-the-cloud-1 --
it should be pretty much what you need. Code is here
https://github.com/pydata/pyrallel.
best,
Peter
2014-02-07 23:28 GMT+01:00 Alessandro Gagliar
Hi All,
I want to run a large sklearn.ensemble.RandomForestClassifier (with maybe a
dozens or maybe hundreds of trees and 100,000 samples). My desktop won’t handle
this so I want to try using StarCluster. RandomForestClassifier seems to
parallelize easily, but I don’t know how I would split it
Arnaud,
I added a issparse attribute to Splitter base class.
Doing so, I think I managed to introduce sparse support without the need of
replicating Splitters' business logic code.
I'm working on this branch [1]
I have two questions:
1- I removed a "with nogil" statement [2]. Is there a way to ke