Hi,

1. The n_jobs parameter controls the number of physical processes started in 
parallel. It should be set depending on the

number of cpu cores available on your machine, independent of the type of or 
size of the CV search you are trying to

run. On a typical desktop machine with four cores this might be n_jobs = 4.


2. n_jobs should really only be set in one of those places. If you were to set 
(for example) n_jobs = 4 in CV and RandomForest,

 you would end up with 16 distinct processes competing for a much smaller 
number of physical cores, potentially making it

slower rather than faster as all of the processes compete with each other.


Hope this helps,

Sam


________________________________
From: Sheila the angel <[email protected]>
Sent: Thursday, 21 August 2014 8:32 PM
To: [email protected]
Subject: [Scikit-learn-general] optimal n_jobs in GridSearchCV

Hi,
Using GridSearchCV, I am trying to optimize two parameters values.
In total, I have 8 parameter combinations and doing 4 fold cross validation.
I want to run it in parallel environment.
My questions are:
1. What should be the n_jobs value, 8 or (8*4=) 32 ?
(I know I can specify n_jobs=-1 but due to some technical reasons, I want to 
know how many jobs GridSearchCV will start.)

2. If I use the classifier such as RandomForestClassifier where 'n_jobs' can be 
specified, will it make any difference if I specify "n_jobs" at the classifier 
level also-


>>>clf = RandomForestClassifier(n_jobs=-1)

>>>grid_search = GridSearchCV(clf, param_grid, n_jobs = -1)


Will this be faster compare to GridSearchCV(RandomForestClassifier() ) ?


Thanks

--

Sheila
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to