Hi,
1. The n_jobs parameter controls the number of physical processes started in
parallel. It should be set depending on the
number of cpu cores available on your machine, independent of the type of or
size of the CV search you are trying to
run. On a typical desktop machine with four cores this might be n_jobs = 4.
2. n_jobs should really only be set in one of those places. If you were to set
(for example) n_jobs = 4 in CV and RandomForest,
you would end up with 16 distinct processes competing for a much smaller
number of physical cores, potentially making it
slower rather than faster as all of the processes compete with each other.
Hope this helps,
Sam
________________________________
From: Sheila the angel <[email protected]>
Sent: Thursday, 21 August 2014 8:32 PM
To: [email protected]
Subject: [Scikit-learn-general] optimal n_jobs in GridSearchCV
Hi,
Using GridSearchCV, I am trying to optimize two parameters values.
In total, I have 8 parameter combinations and doing 4 fold cross validation.
I want to run it in parallel environment.
My questions are:
1. What should be the n_jobs value, 8 or (8*4=) 32 ?
(I know I can specify n_jobs=-1 but due to some technical reasons, I want to
know how many jobs GridSearchCV will start.)
2. If I use the classifier such as RandomForestClassifier where 'n_jobs' can be
specified, will it make any difference if I specify "n_jobs" at the classifier
level also-
>>>clf = RandomForestClassifier(n_jobs=-1)
>>>grid_search = GridSearchCV(clf, param_grid, n_jobs = -1)
Will this be faster compare to GridSearchCV(RandomForestClassifier() ) ?
Thanks
--
Sheila
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general