Re: [slurm-users] weight setting not working

2019-03-12 Thread Andy Leung Yin Sui
Thank you for your reply. I was running 18.08.1 and updated to 18.08.6. Everything was solved. Thank you. On Tue, 12 Mar 2019 at 20:23, Eli V wrote: > > On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui wrote: > > > > Hi, > > > > I am new to slurm and want to use weight option to schedule the

Re: [slurm-users] problems with slurm and openmpi

2019-03-12 Thread Daniel Letai
Hi. On 12/03/2019 22:53:36, Riccardo Veraldi wrote: Hello, after trynig hard for over 10 days I am forced to write to the list.

[slurm-users] Resolution! was Re: Mysterious job terminations on Slurm 17.11.10

2019-03-12 Thread Andy Riebs
It appears that we have gotten to the bottom of this problem! We discovered that we only seem to see this problem if our overnight test script is run with "nohup," as we have been doing for several years. Typically, we would see the mysterious cancellations about once every other day, or 3-4

Re: [slurm-users] problems with slurm and openmpi

2019-03-12 Thread Cyrus Proctor
Both your Slurm and OpenMPI config.logs would be helpful in debugging here. Throw in your slurm.conf as well for good measure. Also, what type of system are you running, what type of high speed fabric are you trying to run on, and what's your driver stack look like? I know the feeling and will

[slurm-users] problems with slurm and openmpi

2019-03-12 Thread Riccardo Veraldi
Hello, after trynig hard for over 10 days I am forced to write to the list. I am not able to have SLURM work with openmpi. Openmpi compiled binaries won't run on slurm, while all non openmpi progs run just fine under "srun". I am using SLURM 18.08.5 building the rpm from the tarball: rpmbuild -ta

Re: [slurm-users] How to force jobs to run next in queue

2019-03-12 Thread Thomas M. Payerle
Are you uising the prioirty/multifactor plugin? What are the values of the various Priority* weight factors? On Tue, Mar 12, 2019 at 12:42 PM Sean Brisbane wrote: > Hi, > > Thanks for your help. > > Either setting qos or setting priority doesn't work for me. However I > have found the cause

Re: [slurm-users] How do I impose a limit the memory requested by a job?

2019-03-12 Thread Paul Edmon
Slurm should automatically block or reject jobs that can't run on that partition in terms of memory usage for a single node.  So you shouldn't need to do anything.  If you need something less than the max memory per node then you will need to enforce some limits.  We do this via a jobsubmit

[slurm-users] How do I impose a limit the memory requested by a job?

2019-03-12 Thread David Baker
Hello, I have set up a serial queue to run small jobs in the cluster. Actually, I route jobs to this queue using the job_submit.lua script. Any 1 node job using up to 20 cpus is routed to this queue, unless a user submits their job with an exclusive flag. The partition is shared and so I

Re: [slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Renfro, Michael
If the failures happen right after the job starts (or close enough), I’d use an interactive session with srun (or some other wrapper that calls srun, such as fisbatch). Our hpcshell wrapper for srun is just a bash function: = hpcshell () { srun --partition=interactive $@ --pty bash -i

[slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Selch, Brigitte (FIDF)
Hello, Some jobs have to be restarted several times until they run. Users start the Job, it fails, they have to do some changes, they start the job again, it fails again ... and so on. So they want to keep the resources until the job is running properly. Is there a possibility to 'inherit'

Re: [slurm-users] weight setting not working

2019-03-12 Thread Eli V
On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui wrote: > > Hi, > > I am new to slurm and want to use weight option to schedule the jobs. > I have some machine with same hardware configuration with GPU cards. I > use QoS to force user at least required 1 gpu gres when submitting > jobs. > The

Re: [slurm-users] How to force jobs to run next in queue

2019-03-12 Thread Bjørn-Helge Mevik
Sean Brisbane writes: > I'm trying to troubleshoot why the highest priority job is not next to run, > jobs in the partition called "Priority" seem to run first. > [...] > The partition called "Priority" has a priority boost assigned through qos. > > PartitionName=Priority Nodes=compute[01-02]