Maybe this has something to do with the requirement that ports be open to use 
srun (if you don't have that port open, it won't work at all)? Perhaps there is 
some limit on each port, etc.?
________________________________________
From: Craig Yoshioka <[email protected]>
Sent: Wednesday, July 5, 2017 1:37:00 PM
To: slurm-dev
Subject: [slurm-dev] srun CPU use

Hi,

I posted this a while back but didn’t get any responses.  I prefer using `srun` 
to invoke commands on our cluster because it is way more convenient then 
writing wrappers for sbatch for running single process jobs (no multiple 
steps).  The problem is that if I submit to many srun jobs, the head node 
starts running out of socket resources (or other?) and I start getting timeouts 
and some of the srun processes start using 100% CPU.

I’ve tried redirecting all I/O to prevent use of sockets, etc., but still see 
this problem.  Can anyone suggest an alternative approach or fix?  Something 
that doesn’t require I write shell wrappers, but also doesn’t keep a running 
process going on the head node?

Thanks,
-Craig

Reply via email to