I saw some discussion on Github and on the list but it didn't answer this
explicitly. And I tried to figure it out from code but I don't want to try
reading joblib right now.
I've been using randomforests with n_jobs = 8 or 12, and I'm noticing that
sometimes I only seem to use a portion of the cores. Not just 1, but not
all of them either. I'm not sure where in the pipeline this is happening.
I'm doing leave-one-out cross-validation, and I've also been measuring the
fit by predicting the training the data. So:
1) model.fit(X[training], y[training]) to train (20k trees)
2) model.predict(X[training]) to get fit on training data
3) model.predict(X[testing]) to get the prediction
I think the training step is using all the cores I give it, and I think
step 3 might only use one core at a time, although I'm not sure why (there
are lots of trees to go through).
>From https://github.com/scikit-learn/scikit-learn/issues/1435 I see a
recommendation to set n_jobs = 1 for step 3.
Should I be setting n_jobs = 1 for step 2? I would think that it should be
able to use all the cores for step 2 as I have more training samples than
cores to use and the individual trees can be spread across cores. But it's
only using 5/12 right now, for some reason. I'm not sure what step it's on.
Any help on how this thing is running would be appreciated.
- James
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general