Hi Youssef.
I would strongly advise you to use a image specific random forest
implementation.
There is a very good implementation by some other MSRC people:
http://research.microsoft.com/en-us/downloads/03e0ca05-8aa9-49f6-801f-bb23846dc147/
It implements a much more complicated model, decision
Thank you Peter, I found that the feature extraction was taking a lot of
extra memory and that was not related to wiseRF, so you were right.
Actually, from top it seems the training part was taking only an extra
20% of memory than the size of the dataset itself, wich is pretty
impressive. So at
Hi Youssef,
Regarding memory usage, you should know that it'll basically blow up if you
increase the number of jobs. With the current implementation, you'll need
O(n_jobs * |X| * 2) in memory space (where |X| is the size of X, in bytes).
That issue stems from the use of joblib which basically
Hi Youssef,
please make sure that you use the latest version of sklearn (= 0.13) - we
did some enhancements to the sub-sampling procedure lately.
Looking at the RandomForest code - it seems that the jobs=-1 should not be
the issue for the parallel training of the trees since ``n_jobs =
thank you very much Peter,
you are right about the n_jobs, something was going wrong with that. When
n_jobs = -1, for larger dataset (1E6 for this case), no cpu was being used
and the process was hanging for a while. getting n_jobs = 1 made everything
work.
yes, I will look into the iPython
ohh makes total sense now!! thank you Gilles!!
Y
On Thu, Apr 25, 2013 at 2:38 AM, Gilles Louppe g.lou...@gmail.com wrote:
Hi Youssef,
Regarding memory usage, you should know that it'll basically blow up if
you increase the number of jobs. With the current implementation, you'll
need
Hi Brian,
thanks for your feedback. were you able to reproduce their results? how big
was your dataset that you have processed so far with an RF?
the MS people have used a distributed RF, so yes, the features I am
guessing were being computed in parallel on all these cores. Though, I am
still new
I've tried larger data sets. It wasn't pretty, much fewer features though
On Apr 25, 2013 4:03 AM, Peter Prettenhofer peter.prettenho...@gmail.com
wrote:
Hi Youssef,
please make sure that you use the latest version of sklearn (= 0.13) - we
did some enhancements to the sub-sampling procedure
Hi Youssef,
You're trying to do exactly what I did. First thing to note is that the
Microsoft guys don't precompute the features, rather they compute them on
the fly. That means that they only need enough memory to store the depth
images, and since they have a 1000 core cluster, computing the