Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-07-02 Thread Matteo Guglielmi
__ From: slurm-users on behalf of Thomas M. Payerle Sent: Friday, June 29, 2018 7:34:09 PM To: Slurm User Community List Subject: Re: [slurm-users] All user's jobs killed at the same time on all nodes A couple comments/possible suggestions. First, it looks to me that all the jobs a

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread John Hearns
is killed, why > would all others go down as well? > > > That would make sense if a single mpirun is running 36 tasks... but the > user is not doing this. > > ________ > From: slurm-users on behalf of > John Hearns > Sent: Friday, June 2

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Matteo Guglielmi
ubject: Re: [slurm-users] All user's jobs killed at the same time on all nodes Matteo, a stupid question but if these are single CPU jobs why is mpirun being used? Is your user using these 36 jobs to construct a parallel job to run charmm? If the mpirun is killed, yes all the other processes

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Paddy Doyle
Hi Matteo, On Fri, Jun 29, 2018 at 10:13:33AM +, Matteo Guglielmi wrote: > Dear comunity, > > I have a user who usually submits 36 (identical) jobs at a time using a > simple for loop, > thus jobs are sbatched all the same time. > > Each job requests a single core and all jobs are

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread John Hearns
Matteo, a stupid question but if these are single CPU jobs why is mpirun being used? Is your user using these 36 jobs to construct a parallel job to run charmm? If the mpirun is killed, yes all the other processes which are started by it on the other compute nodes will be killed. I suspect your

[slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Matteo Guglielmi
Dear comunity, I have a user who usually submits 36 (identical) jobs at a time using a simple for loop, thus jobs are sbatched all the same time. Each job requests a single core and all jobs are independent from one another (read different input files and write to different output files).