Re: [slurm-users] Jobs in pending state

2018-04-29 Thread Paul Edmon
It sounds like your second partition is getting primarily scheduled by the backfill scheduler.  I would try the partition_job_depth option as otherwise the main loop only looks at priority order and not by partition. -Paul Edmon- On 4/29/2018 5:32 AM, Zohar Roe MLM wrote: Hello. I am having

Re: [slurm-users] Job still running after process completed

2018-04-23 Thread Paul Edmon
I would recommend putting a clean up process in your epilog script.  We have a check here that sees if the job completed and if so it then terminates all the user processes by kill -9 to clean up any residuals. If it fails it closes of the node so we can reboot it. -Paul Edmon- On 04/23

Re: [slurm-users] Time-based partitions

2018-03-12 Thread Paul Edmon
You could probably accomplish this using a job submit lua script and some crafted QoS's.  It would take some doing but I imagine it could work. -Paul Edmon- On 03/12/2018 02:46 PM, Keith Ball wrote: Hi All, We are looking to have time-based partitions; e.g.  a"day" and "ni

Re: [slurm-users] ntasks and cpus-per-task

2018-02-22 Thread Paul Edmon
Yeah, I've found that in those situations to have people wrap their threaded programs in srun inside of sbatch.  That way the scheduler knows which process specifically gets the threading. -Paul Edmon- On 02/22/2018 10:39 AM, Loris Bennett wrote: Hi Paul, Paul Edmon <ped...@cfa.harvard.

Re: [slurm-users] Extreme long db upgrade 16.05.6 -> 17.11.3

2018-02-22 Thread Paul Edmon
though so perhaps we avoided that particular query due to that. From past experience these major upgrades can take quite a bit of time as they typically change a lot about the DB structure in between major versions. -Paul Edmon- On 02/22/2018 06:17 AM, Malte Thoma wrote: FYI: * We broke our

Re: [slurm-users] restrict application to a given partition

2018-01-15 Thread Paul Edmon
script doesn't catch it. -Paul Edmon- On 1/15/2018 8:31 AM, John Hearns wrote: Juan, me kne-jerk reaction is to say 'containerisation' here. However I guess that means that Slurm would have to be able to inspect the contents of a container, and I do not think that is possible. I may be very

Re: [slurm-users] Changing resource limits while running jobs

2018-01-04 Thread Paul Edmon
Typically changes like this only impact pending or newly submitted jobs.  Running jobs usually are not impacted, though they will count against any new restrictions that you put in place. -Paul Edmon- On 1/4/2018 6:44 AM, Juan A. Cordero Varelaq wrote: Hi, A couple of jobs have been

Re: [slurm-users] Intermittent "Not responding" status

2017-12-04 Thread Paul Edmon
is substantial, thus the lag crossing back and for can add up. I would check to see if all your nodes can talk to each other and the master and if your Timeouts are set high enough. -Paul Edmon- On 12/04/2017 01:57 PM, Stradling, Alden Reid (ars9ac) wrote: I have a number of nodes that have, after our

Re: [slurm-users] PMIx and Slurm

2017-11-28 Thread Paul Edmon
then to let PMIx handle pmix solely and let slurm handle the rest.  Thanks! Am I right in reading that you don't have to build slurm against PMIx?  So it just interoperates with it fine if you just have it installed and specify pmix as the launch option?  That's neat. -Paul Edmon- On 11/28/2017 6

[slurm-users] PMIx and Slurm

2017-11-28 Thread Paul Edmon
is the right way of building PMIx and Slurm such that they interoperate properly? Suffice it to say little to no documentation exists on how to properly this, so any guidance would be much appreciated. -Paul Edmon-

<    1   2   3