Hello there,
We recently deployed SLURM for a Bioinformatics cluster at KEMRI-Wellcome
Trust, Kilifi, kenya, and after following the setup guide and the online
configurator ( to build the configuration file), here are the errors we ran ino:
1. None of the slurmd daemons on either node
We are pleased to announce the availability of Slurm version 15.08.3,
includes about 25 bug fixes developed over the past couple of weeks as
listed below. Slurm downloads are available from:
http://www.schedmd.com/#repos
SC15
There will be a Slurm User Group meeting on Thursday 19 November
Carlos,
I think you are pointing me in the right direction.
$ scontrol show node
NodeName=clunode Arch=x86_64 CoresPerSocket=1
CPUAlloc=- CPUErr=0 CPUTot=12 CPULoad=0.1 Features=(null)
Gres=(null)
NodeAddr=clu NodeHostName=clu Version=14.11
OS=Linux RealMemory=1 AllocMem=0 Sockets=12
Trevor,
If using cons_res there is no need to specify Shared=YES unless you want to
share the same resources among different jobs.
form slurm.conf man page:
YES Makes all resources in the partition available for sharing upon
request by the job. Resources will only be over-subscribed
It seems you have no memory configured for the nodes. Can you post the
output of scontrol show node for a single node?
If you don't care about memory resources then maybe switching to CR_Core
makes more sense.
Regards,
Carlos
On Wed, Nov 4, 2015 at 3:09 PM, charlie hemlock
Charlie,
It appears your default (only?) partition is configured with the Slurm default
of Shared=NO.
To get multiple jobs running on the nodes at the same time your admins need to
change the 'Shared' parameter in the partition configuration to either
Shared=YES or Shared=FORCE: and make
Hi Trevor/Triveni,
Thank you for your responses.
I have been able to use the sbatch --array to run all the jobs, *but only a
single job/task executes per node.*
In my example cluster of 6 nodes with 4 cores each, I can only get 6 tasks
to run simultaneously - not 24.
Also my batch array could in
Charlie,
It seems that the problem may be the Memory reservation per step. You
mentioned that you are using CR_Core_Memory but you haven't mentioned any
DefMemPerCPU, and by default slurm allocates all the memory to a job if it
is not requesting any.
So try with --mem=#MB and tell us if it works
Carlos,
Thank you for response.
I tried 2 tests w/ sbatch arguments :
--mem=20
--mem-per-cpu=20
both failed with errors:
sbatch: error: memory specification can not be specified
sbatch: error: bath job submission failed. Requested node configuration is
not available.
Each node has 100+ gb of
With the way the # of cpus are defined in the slurm.conf w/Procs, I'm
wondering if CR_Core or CR_Cpu is more appropriate? See below.
( The slurm.conf is not setting the each node's Sockets, CoresPerSocket,
and ThreadsPerCore.)
Thank you all for your help and patience!
*Option 1*:CR_Core
#
I'm not completely sure what you should use, but If you don't plan to use
hyperthreading you probably should use the CR_CPU
Check the slurm.conf man page for further details
On Wed, Nov 4, 2015 at 4:33 PM, charlie hemlock
wrote:
> With the way the # of cpus are
Dear All,
Want to learn how most of you mange partitions and changes to them do.
For example: I have hundreds of nodes in a partition, and when time comes to
shuffle a handful of them out of the partition and move it to another
partition, I find editing the nodes entries in the partition a bit
I used clustershell nodeset command [1] as a helper to 'folding' our node
lists. In our case the output can be condensed further when passed to
SLURM since some of our systems have 2 sets of digits in their name.
$ nodeset -f c0611n1 c0611n2 c0612n1 c0612n2
c[0611-0612]n1,c[0611-0612]n2
SLURM:
Carlos,
You are correct about Shared=NO vs. Shared=FORCE:1 and the use of the
--exclusive flag.
We use Shared=FORCE:1 on our shared partitions for this reason (shared means
shared) and in fact go further to block multi-node submissions to shared
partitions with QOS and whole node resource
I learned something new, thanks!
- Trey
=
Trey Dockendorf
Systems Analyst I
Texas A University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu
Jabber: treyd...@tamu.edu
On Wed, Nov 4, 2015 at 12:09 PM,
You can always use scontrol show hostlist
scontrol show hostlist c0611n1,c0611n2,c0612n1,c0612n2
c0611n[1-2],c0612n[1-2]
or show hostnames for the opposite...
scontrol show hostnames c0611n[1-2],c0612n[1-2]
c0611n1
c0611n2
c0612n1
c0612n2
That should avoid the nodeset and order it the way
16 matches
Mail list logo