It sounds like your second partition is getting primarily scheduled by
the backfill scheduler. I would try the partition_job_depth option as
otherwise the main loop only looks at priority order and not by partition.
-Paul Edmon-
On 4/29/2018 5:32 AM, Zohar Roe MLM wrote:
Hello.
I am having
I would recommend putting a clean up process in your epilog script. We
have a check here that sees if the job completed and if so it then
terminates all the user processes by kill -9 to clean up any residuals.
If it fails it closes of the node so we can reboot it.
-Paul Edmon-
On 04/23
You could probably accomplish this using a job submit lua script and
some crafted QoS's. It would take some doing but I imagine it could work.
-Paul Edmon-
On 03/12/2018 02:46 PM, Keith Ball wrote:
Hi All,
We are looking to have time-based partitions; e.g. a"day" and "ni
Yeah, I've found that in those situations to have people wrap their
threaded programs in srun inside of sbatch. That way the scheduler
knows which process specifically gets the threading.
-Paul Edmon-
On 02/22/2018 10:39 AM, Loris Bennett wrote:
Hi Paul,
Paul Edmon <ped...@cfa.harvard.
though so perhaps we avoided that particular query
due to that.
From past experience these major upgrades can take quite a bit of time
as they typically change a lot about the DB structure in between major
versions.
-Paul Edmon-
On 02/22/2018 06:17 AM, Malte Thoma wrote:
FYI:
* We broke our
script doesn't catch it.
-Paul Edmon-
On 1/15/2018 8:31 AM, John Hearns wrote:
Juan, me kne-jerk reaction is to say 'containerisation' here.
However I guess that means that Slurm would have to be able to inspect
the contents of a container, and I do not think that is possible.
I may be very
Typically changes like this only impact pending or newly submitted
jobs. Running jobs usually are not impacted, though they will count
against any new restrictions that you put in place.
-Paul Edmon-
On 1/4/2018 6:44 AM, Juan A. Cordero Varelaq wrote:
Hi,
A couple of jobs have been
is substantial, thus the lag crossing back and for can add up. I
would check to see if all your nodes can talk to each other and the
master and if your Timeouts are set high enough.
-Paul Edmon-
On 12/04/2017 01:57 PM, Stradling, Alden Reid (ars9ac) wrote:
I have a number of nodes that have, after our
then to let PMIx handle pmix solely and let slurm handle the rest. Thanks!
Am I right in reading that you don't have to build slurm against PMIx?
So it just interoperates with it fine if you just have it installed and
specify pmix as the launch option? That's neat.
-Paul Edmon-
On 11/28/2017 6
is the right way of building PMIx and Slurm such that they
interoperate properly?
Suffice it to say little to no documentation exists on how to properly
this, so any guidance would be much appreciated.
-Paul Edmon-
201 - 210 of 210 matches
Mail list logo