Thank you. I sync /etc/slurm.conf from slurmctld node then then re-run
slurmdbd as root, which seems to solve the problem. Before that, "sacctmgr list
clusters" complained missing /etc/slurm.conf .
Best,
Jianwen
> On Jan 11, 2019, at 12:33, Chris Samuel wrote:
>
> On 10/1/19 6:15
Hi Jean-Mathieu,
I'd also recommend that you update to 17.11.12. I had issues w/job arrays
in 17.11.7,
such as tasks erroneously being held as "DependencyNeverSatisfied" that,
I'm
pleased to report, I have not seen in .12.
Best,
Lyn
On Fri, Jan 11, 2019 at 8:13 AM Jean-mathieu CHANTREIN <
Hi Chris,
Thank you for your comments. Yesterday I experimented with increasing the
PriorityWeightJobSize and that does appear to have quite a profound effect on
the job mix executing at any one time. Larger jobs (needing 5 nodes or above)
are now getting a decent share of the nodes in the
You should be able to turn on some backfill debug info from slurmctl, You can
have slurm output the backfill info. Take a look at DebugFlags settings using
Backfill and BackfillMap.
Your bf_window is set to 3600 or 2.5 days, if the start time of the large job
is out further than that, it
Hi Janne,
On Fri, 2019-01-11 at 10:37 +0200, Janne Blomqvist wrote:
> On 11/01/2019 08.29, Sergey Koposov wrote:
> > What is your memory limit configuration in slurm? Anyway, a few things to
> > check:
I guess these are the most relevant (uncommented) params I could see in the
slurm.conf are
I forgot to mention we are running slurm version 18.08.3 before.
On Fri, Jan 11, 2019 at 10:35:09AM -0500, Paul Edmon wrote:
> I'm pretty sure that gres.conf has to be on all the nodes as well
> and not just the master.
Thanks Paul. We deploy the same slurm configuration, including the
I'm pretty sure that gres.conf has to be on all the nodes as well and
not just the master.
-Paul Edmon-
On 1/11/19 5:21 AM, Sean McGrath wrote:
Hi everyone,
Your help for this would be much appreciated please.
We have a cluster with 3 types of gpu configured in gres. Users can successfully
You don't put any limitation on your master nodes ?
I answer myself. I only have to change the PropagateResourceLimits variable
from slurm.conf to NONE. This is not a problem since I activate the cgroups
directly on each of the compute nodes.
Regards.
Jean-Mathieu
> De: "Jean-Mathieu
Hello Jeffrey.
That's exactly it. I thank you very much, I would not have thought of that. I
have actually put a limitation of 20 nproc in /etc/security/limits.conf to
avoid potential misuse of some users. I had not imagined for one second that it
could propagate on computational nodes!
You
What does ulimit tell you on the compute node(s) where the jobs are running?
The error message you cited arises when a user has reached the per-user process
count limit (e.g. "ulimit -u"). If your Slurm config doesn't limit how many
jobs a node can execute concurrently (e.g. oversubscribe),
Hi everyone,
Your help for this would be much appreciated please.
We have a cluster with 3 types of gpu configured in gres. Users can successfully
request 2 of the gpu types but the third errors when requested.
Here is the successful salloc behaviour:
root@boole01:/etc/slurm # salloc
Hello,
I'm new to slurm (I used SGE before) and I'm new to this list. I have some
difficulties with the use of slurm's array jobs, maybe you can help me?
I am working with slurm version 17.11.7 on a debian testing. I use slurmdbd and
fairshare.
For my current user, I have the following
Sergey Koposov writes:
> The trick is that my code uses memory mapping (i.e. mmap) of one
> single large file (~12 Gb) in each thread on each node.
> With this technique in the past despite the fact the file is
> (read-only) mmaped in say 16 threads, the actual memory footprint was
> still ~ 12
On 11/01/2019 08.29, Sergey Koposov wrote:
> Hi,
>
> I've recently migrated to slurm from pbs on our cluster. Because of that, now
> the job memory limits are
> strictly enforced and that causes my code to get killed.
> The trick is that my code uses memory mapping (i.e. mmap) of one single large
14 matches
Mail list logo