Hi,

I'm trying to run GridSearchCV on a computational cluster but my jobs keep
failing with an error from the queuing system claiming I'm using too many
cores.

If I set n_jobs equal 1, then the job doesn't fail but if it's more than
one, no matter what number it is the job fails.

In the example below I've set n_jobs to 6 and pre_dispatch to 12, and asked
for 8 processors from the queue. I got the following error after ~10
minutes: "PBS: job killed: ncpus 19.73 exceeded limit 8 (sum)"


I've tried playing around the pre_dispatch but it makes difference. There
will be other people running calculations on these nodes, so might there be
some kind of intereference between GridSearchCV and the other jobs?

Anyone come across anything like this before?

Cheers

Clyde


import dill
import numpy as np

from sklearn.kernel_ridge import KernelRidge
from sklearn.grid_search import GridSearchCV

label='test_grdsrch3'
X_train = np.random.rand(971,276)
y_train = np.random.rand(971)

kr = GridSearchCV(KernelRidge(), cv=10,
                  param_grid={"kernel": ['rbf', 'laplacian'],
                              "alpha": [2**i for i in
np.arange(-40,-5,0.5)],                 #alpha=lambda
                              "gamma": [1/(2.**(2*i)) for i in
np.arange(5,18,0.5)]},   #gamma = 1/sigma^2
                  pre_dispatch=12,
                  n_jobs=6)

kr.fit(X_train, y_train)

with open(label+'.pkl','w') as data_f:
        dill.dump(kr, data_f)
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to