[Scikit-learn-general] RandomForests and n_jobs--I'm using some but not all cores

James Webber Tue, 11 Jun 2013 21:28:52 -0700

I saw some discussion on Github and on the list but it didn't answer this
explicitly. And I tried to figure it out from code but I don't want to try
reading joblib right now.


I've been using randomforests with n_jobs = 8 or 12, and I'm noticing that
sometimes I only seem to use a portion of the cores. Not just 1, but not
all of them either. I'm not sure where in the pipeline this is happening.

I'm doing leave-one-out cross-validation, and I've also been measuring the
fit by predicting the training the data. So:

1) model.fit(X[training], y[training]) to train (20k trees)
2) model.predict(X[training]) to get fit on training data
3) model.predict(X[testing]) to get the prediction

I think the training step is using all the cores I give it, and I think
step 3 might only use one core at a time, although I'm not sure why (there
are lots of trees to go through).

>From https://github.com/scikit-learn/scikit-learn/issues/1435 I see a
recommendation to set n_jobs = 1 for step 3.

Should I be setting n_jobs = 1 for step 2? I would think that it should be
able to use all the cores for step 2 as I have more training samples than
cores to use and the individual trees can be spread across cores. But it's
only using 5/12 right now, for some reason. I'm not sure what step it's on.

Any help on how this thing is running would be appreciated.

- James

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] RandomForests and n_jobs--I'm using some but not all cores

Reply via email to