job for the lua job submit plugin
(job_submit.lua). It can check what users have specified, write out
custom errors or change the settings of jobs.
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
of small,
distributed jobs running, and a long queue of pending jobs), I
personally wouldn't want schedmd to sacrifice that for making updates of
node lists easier. Especially since I haven't seen the problem JinSung
Kang reports. :)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for R
rong with how my partitions are defined?
That sounds unlikely, IMO.
--
Cheers,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
soon as the signal
arrives. I got bit by this behaviour trying to do exactly the same that
you did. :)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
will be looking at
this feature again. Thanks for the tip! :)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
feature from the node, and then request themself to
be requeued.
Prior to submit the jobs, we add the "fixme" feature to all nodes
needing maintenance.
(In reality, our setup is a little mor complex, since it includes
reinstalling the os on the nodes, but the principle is the same
ch), but looks like this assumption is wrong?
That is right, that is wrong. :)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
dictable, both for the programmer and for the user.
It is by design, because people often need to give arguments or options
to their jobscript, e.g.,
sbatch --time=1-0:0:0 myjob.sh inputfile
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
ric usage. Then you can set
FairshareWeight to 0 and use the Grp*Mins parameters to set hard limits.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
tup, the first option is preferrable; just putting it on the
queue and let it wait until it's turn. But of course, there are other
setups where the second option would be best. Could you perhaps make it
configurable, so a site can choose?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department
e pids.
Not that I know of, but it should be possible to script.
> And how to parse the nodelist like "cn[11033,11069],gn[1103-1120]" ?
scontrol show hostnames cn[11033,11069],gn[1103-1120]
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
u.se/~kent/python-hostlist/ by Kent Engström at NSC. It's
> simple to install this as an RPM package, see
> https://wiki.fysik.dtu.dk/niflheim/SLURM#expanding-host-lists
For the simple case you show, you could just use
$ scontrol show hostnames a[095,097-098]
a095
a097
a098
--
Regards,
ke
use Slurmdb qw(:all SLURMDB_ADD_USER);
$what = SLURMDB_ADD_USER();
just gives the error "SLURMDB_ADD_USER is not a valid Slurmdb macro"
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
file names to a dot file in $SCRATCH)
- The Epilog copies any registered files back to the job submit dir (it
uses "su - $USER" when doing this).
- The epilog deletes the directory
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signa
omain config parameter was added in Slurm 17.02.
A different option would be to configure your sendmail to accept
domain-less mails (and perhaps add a default domain itself).
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature
size" [1]
(JobAcctGatherParams=UsePss), which is cgroup uses (I believe), and
sounds like the best estimate to me.
[1] https://en.wikipedia.org/wiki/Proportional_set_size
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Check out the thread on this list about a week ago, titled "Unrestricted
use of a node". (In short, --exclusive with --mem=0 or --mem-per-cpu=0
might be more or less what you want.)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
cpus it is allowed to use.)
This is on 15.08.12. YMMV.
--
Cheers,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
tell this Slurm. - OK, I can ask for 24
> cores and 64 GB in a node, but then I do not get the chance to run on 12
> cores/32 GB.
For the memory part, you could specify --mem=0. That will allocate all
of the memory on whichever node the job lands on. For the number of
cores, I don't know.
Jordan Willis writes:
>Thank you,
>Can you confirm that this will take an update from SLURM 14.11.15 to
>current?
I never ran 14.11, but in 14.03, you can use GrpCPUs=1000 instead of
GrpTRES=cpu=1000.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research
access to more then one account, I can use 1000 cpus in each
account.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
#x27;d guess it
should have been possible to use scancel in PrologSlurmctld also in
15.08.12.
Does anyone know if this is an intentional change (and SchedMD just
forgot to update the docs) or a bug?
(I haven't found anything relevant in the NEWS file or on
bugs.schedmd.com.)
--
Regards,
B
apart from that, this is my understanding too.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
There is a plugin under development, that will/might provide those
features. It was presented at SLUG 16:
http://slurm.schedmd.com/SLUG16/MCS.pdf
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
to generate
> my report. Does this approach make sense or are there better
> alternatives.
sacct can also give you the submit time, start time, end time and
elapsed time.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
To me, this sounds like a job for a job submit plugin, for instance
job_submit.lua. That way you could reject the job before it gets
submitted into the queue.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
;, which prints out resource usage. As long
as users remember to source the setup file, they get the usage
statistics in the bottom of their stdout file. Not very elegant, but it
works.
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
low" qos, and
to get that to work, we've found that we must put the accounts normal
limits on a qos, not on the account itself. Usually this means that we
have a qos for each account, and then a common "low" qos.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
md appear to be
Just a note: I tried this (for a different reason), but found out it
didn't have any effect (gather the output to a log file and look at the
gcc lines). However, if I did -D '%with_cflags CFLAGS="-O0 -g3"' (i.e.,
removed the initial "_"), it had the
I just noticed that as of 14.11.6, optimization is turned off (-O0) by
default when building slurm.
Is there any reason not to use --disable-debug when building slurm for a
production cluster?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
+1
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
provisioning tool?
- A locally developed solution?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
tch empty-jobname.sm
sbatch: option requires an argument -- 'J'
Submitted batch job 14221261
$
A more consistent behaviour would have been nice. My suggestion is:
report error and fail to submit the job.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
writes:
> We are looking for comments and feedback on this proposed behavior
[...]
> +#define HEALTH_RETRY_DELAY 10
Have you thought about using the health_check_interval instead? Or make
it a separate configurable option?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Re
obs just
before they time out.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Test 14.10 in the test suite (of slurm 15.08.8, at least) uses
$sinfo -tidle -h -o%n
to find idle nodes. This only works if NodeHostname == NodeName on the
nodes. The following should work regardless of this:
$scontrol show hostnames \$($sinfo -tidle -h -o%N)
--
Regards,
Bjørn-Helge
n a
process needs more memory instead of killing the process. If I'm
correct, oom will _not_ kill a job due to cached data.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
"Wiegand, Paul" writes:
> This worked. Thank you Bjørn-Helge.
You're welcome! :)
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Felip Moll writes:
> I will try JobAcctGatherParams to NoShared.
>
> This is an example of job step being killed. It's being killed by oom, but
> it's invoked by cgroups:
Since the job was killed by the oom, NoShared will not help. It does
not affect cgroups.
--
Rega
e data between several processes, the shared space will be
counted once for each process(!). Cgroups seems to count the shared
data only once. So if a process is killed by oom instead of by slurm,
it is probably not due to shared data.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Re
s
enough. We have the partition because our lowpri jobs are allowed to
run on special nodes (like hugemem or accellerator nodes) that normal
jobs are not allowed to use.)
I hope this made sense to you. :)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
with static libraries, which
slurm does _not_ install.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Thanks. Nice to know!
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
we activated
checkpointing. When slurmcltd started, the checkpointing plugin
expected some extra data in the job states, which obviously wasn't
there, and slurmctld decided the data was invalid and killed all jobs.
(I don't know if this is still a problem.)
--
Regards,
Bjørn-Helge
thorough,
unfortunately, but according to
https://lists.fedoraproject.org/pipermail/mingw/2012-January/004421.html
the .la files are only needed in order to link against static libraries,
and since Slurm doesn't provide any static libraries, I guess it would
be safe for the slurm-devel rpm not to
h IB1 or all with IB2. Search for "Matching OR" in
the sbatch man page for details. (We used this on our previous cluster,
which had two different IB networks.)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
LIMIT ***
That usually means the job tried to run longer than its --time
specification.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
atabase makes Gold quite slow, so we have had to add quite a lot of
error checking and handling in the prolog and epilog scripts.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Christopher Samuel writes:
> http://karaage.readthedocs.org/en/latest/introduction.html
Karaage looks interesting for managing projects and users. Can it
manage usage limits?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
.03.7. Perhaps this has changed in later versions?
Also, "nothing is ever easy": we want to account not CPU hours, but PE
(processor equivalents) hours.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
accounting.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Ok, thanks.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
.03.7, btw.)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Thanks!
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
pi itself?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
rds,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Thanks! I'll try to apply that patch.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
clude CPU time of child processes.”
In my experience, that description might not be accurate. It seems also
child processes are included, as long as the job doesn't time out. Here
is an email I wrote about it last year:
From: Bjørn-Helge Mevik
Subject: [slurm-dev] UserCPU etc. for subprocess
I second this wish. :)
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
ng the test suite
for versions 14.03.8--14.03.10, we didn't upgrade to .10.)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
the .../uid_NNN directories are
removed.
Does anyone know what these messages mean? Should we just ignore them?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
_bitstr_bits(tmp_qos_bitstr) was still 26.
Any help in figuring out what goes wrong (or how to fix it :) is
appreciated!
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Thanks for the tip!
We actually already have a setup where "srun
--ntasks=$SLURM_JOB_NUM_NODES /bin/true" is run at the start of every
job, so we're definitely going to look into this.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
like this?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
COMPLETED 00:01:07 01:03.980 00:02.207 01:06.187
43COMPLETED 00:01:08 01:05.230 00:02.173 01:07.403
43.batch COMPLETED 00:01:08 01:05.230 00:02.173 01:07.403
i.e., time spent in subprocesses is reported.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department fo
ient.
It would also be nice if the node-global destinations could be
configurable, instead of being hard-coded in the script (or at least be
set at the top of the script). For instance, on our system, the
node-global file systems are /work and /cluster, not /scratch and /home.
--
Regards,
B
nfortunately, I don't know enough Expect (tcl?) to
suggest how to implement that.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
writes:
> As far as I can tell, it has been this way (broken) forever. It will be fixed
> in 14.03.7.
Thx!
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
agnu
dejagnu-1.4.4-17.el6.noarch
# rpm -q check
check-0.9.8-1.1.el6.x86_64
Is there something else we are missing?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Thanks!
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
that are printed on the user's stderr? If so,
how?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Just a short note about terminology. I believe "processor equivalents"
(PE) is a much used term for this. It is at least what Maui and Moab
uses, if I recall correctly. The "resource*time" would then be PE seconds
(or hours, or whatever).
--
Regards,
Bjørn-Helge Mevik, dr
tributed across many nodes. Also see
the SelectParameters configuration parame-
ter CR_LLN to use the least loaded nodes in
every partition.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Nicolai Stange writes:
> Hi Bjorn-Helge,
>
> thank you very much for your reply!
>
> Bjørn-Helge Mevik writes:
>> (We did have some problems when using srun inside the script, but I
>> believe mpirun should work.)
> Indeed it does for OpenMPI and MVAPICH2.
>
will
_not_ be take into account; all specifications must be in $slurmargs).
The salloc command will wait until the command ($script) has finished
before exiting.
(We did have some problems when using srun inside the script, but I
believe mpirun should work.)
--
Regards,
Bjørn-Helge Mevik, dr
artition both jobs start as they should.
Sorry for not checking well enough what I did!
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
the priority of the job with
# scontrol update jobid=20 nice=-1000
does not help. It still does not start.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
That would be a very interesting feature.
Similarly to what Christopher Samuel wrote, we have «hacked around» the
issue for project limits (not fairshare) by converting memory usage to
processor-equivalents and using Gold for the accounting.
--
Regards,
Bjørn-Helge Mevik, dr. scient
If you switch to use slurmdbd, the password goes into slurmdbd.conf,
which only exists on the head node, and only needs to be readable by root.
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Damien François writes:
> Hello,
>
> what is the most efficient way of finding how many jobs are currently
> running, pending, etc in the system ?
I tend to use
squeue -h -o %T | sort | uniq -c
--
Cheers,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Bjørn-Helge Mevik writes:
We have investigated a bit further:
> 1) What happened here?
I didn't look far enough in the logs. It turns out that slurmctld
segfaulted later the same day (last message in the log was at
2013-09-30T14:34:56+02:00). When it was started, it said:
[2013-09-
"Loris Bennett" writes:
> Hi Bjørn-Helge,
>
> Bjørn-Helge Mevik
> writes:
>
>> We are running slurm 2.5.6.
>>
>> This night, about 10 jobs were scheduled and completed, but they are
>> still listed as RUNNING in sacct. For instance:
>>
pilog
[2013-09-30T04:11:10+02:00] Reading slurm.conf file: /etc/slurm/slurm.conf
[2013-09-30T04:11:10+02:00] Running spank/epilog for jobid [3371606] uid [4010]
2013-09-30T04:11:10+02:00] spank: opening plugin stack /etc/slurm/plugstack.conf
[2013-09-30T04:11:10+02:00] debug: [job 3371606] attemptin
m/SchedMD/slurm/commit/29094e33fcbb4f29e9512059bbdd18ba3504134c
>
> That fixes several of the problems. I'm not sure why the job_state.new file
> is reported by lsof, but will probably investigate further at a later time.
Thanks!
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
ent ill effects, but don't want to do that on our
production cluster until we know it's safe.
If they are not needed, perhaps it would be a good idea for slurmctld to
close them when starting the prologs/epilogs?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
[2013-09-20T04:50:16+02:00] debug: power_save module disabled, SuspendTime < 0
[2013-09-20T04:50:16+02:00] error: Error binding slurm stream socket: Address
already in use
[2013-09-20T04:50:16+02:00] fatal: slurm_init_msg_engine_addrname_port error
Address already in use
--
Regards,
Bjørn-He
n 2013-05-12T23:03:59 and 2013-05-13T00:00:00). Running
is also considered "eligible".
> I totally agree your comment on that sacct lacks on the way to filter jobs
> that are actually within the time interval.
As Danny said: add --state=RUNNING. :)
--
Regards,
Bjørn-Helge Mevik,
Danny Auble writes:
>>It would have been nice to have the possibility to select jobs that
>>were
>>_running_ (or _started_) in an interval, but I don't think it's there.
>
> Just ask for the state to be 'running'.
:)
--
Regards,
Bjørn-Helg
ice to have the possibility to select jobs that were
_running_ (or _started_) in an interval, but I don't think it's there.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
Moe Jette writes:
> Quoting Bjørn-Helge Mevik :
>
>>
>> Moe Jette writes:
>>
>>> * Changes in Slurm 13.12.0pre1
>>> ==
>>
>> Just curious: Why the sudden jump in version numbering? year.month?
>
> Correct
to use whole nodes.)
> -- Add mechanism for job_submit plugin to generate error message for srun,
> salloc or sbatch to log.
But still not to stderr, right?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
allocated 16.1 GiB virtual memory, but is only using 104 MiB resident.)
I would suggest looking at cgroups for limiting memory usage.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
2051 -- Signal ESLURM_INVALID_TIME_LIMIT
end
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
a lot of threads) will "use" a lot of VMEM.
There was a change in glibc 2.something (I think it was) in how VMEM is
allocated for threads. For instance, our slurmctld right now "uses" 16
GiB VMEM, but only 117 MiB RSS.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for R
Moe Jette writes:
> Yes, that is correct.
Thanks!
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
lue a job can specify for
"delay". So in order for allowing users to specify a delay of, say,
12 hours, one must set max_switch_wait in slurm.conf to something as
large as 12 hours.
Is this the right interpretation?
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
scheduler used to
time out after MessageTimeout/2 seconds, but looking at the code for
2.5.6 this seems to have changed.
Keep us posted about what you find. I'm planning to switch to 2.5.6
tomorrow, and have from time to time had problems getting the
backfilling to be fast enough.
--
Regar
Kevin Abbey writes:
> This sounds great. If you can share after testing it would be be very
> much appreciated.
Will do. (There will be some parts of it tailored to our site, but that
shouldn't be hard to remove/change.)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department f
1 - 100 of 128 matches
Mail list logo