[slurm-dev] Help: SLURM will not start on either nodes after setup.

2015-11-04 Thread Dennis Mungai
Hello there, We recently deployed SLURM for a Bioinformatics cluster at KEMRI-Wellcome Trust, Kilifi, kenya, and after following the setup guide and the online configurator ( to build the configuration file), here are the errors we ran ino: 1. None of the slurmd daemons on either node

[slurm-dev] Slurm version 15.08.3 now available

2015-11-04 Thread Moe Jette
We are pleased to announce the availability of Slurm version 15.08.3, includes about 25 bug fixes developed over the past couple of weeks as listed below. Slurm downloads are available from: http://www.schedmd.com/#repos SC15 There will be a Slurm User Group meeting on Thursday 19 November

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread charlie hemlock
Carlos, I think you are pointing me in the right direction. $ scontrol show node NodeName=clunode Arch=x86_64 CoresPerSocket=1 CPUAlloc=- CPUErr=0 CPUTot=12 CPULoad=0.1 Features=(null) Gres=(null) NodeAddr=clu NodeHostName=clu Version=14.11 OS=Linux RealMemory=1 AllocMem=0 Sockets=12

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread Carlos Fenoy
Trevor, If using cons_res there is no need to specify Shared=YES unless you want to share the same resources among different jobs. form slurm.conf man page: YES Makes all resources in the partition available for sharing upon request by the job. Resources will only be over-subscribed

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread Carlos Fenoy
It seems you have no memory configured for the nodes. Can you post the output of scontrol show node for a single node? If you don't care about memory resources then maybe switching to CR_Core makes more sense. Regards, Carlos On Wed, Nov 4, 2015 at 3:09 PM, charlie hemlock

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread Cooper, Trevor
Charlie, It appears your default (only?) partition is configured with the Slurm default of Shared=NO. To get multiple jobs running on the nodes at the same time your admins need to change the 'Shared' parameter in the partition configuration to either Shared=YES or Shared=FORCE: and make

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread charlie hemlock
Hi Trevor/Triveni, Thank you for your responses. I have been able to use the sbatch --array to run all the jobs, *but only a single job/task executes per node.* In my example cluster of 6 nodes with 4 cores each, I can only get 6 tasks to run simultaneously - not 24. Also my batch array could in

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread Carlos Fenoy
Charlie, It seems that the problem may be the Memory reservation per step. You mentioned that you are using CR_Core_Memory but you haven't mentioned any DefMemPerCPU, and by default slurm allocates all the memory to a job if it is not requesting any. So try with --mem=#MB and tell us if it works

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread charlie hemlock
Carlos, Thank you for response. I tried 2 tests w/ sbatch arguments : --mem=20 --mem-per-cpu=20 both failed with errors: sbatch: error: memory specification can not be specified sbatch: error: bath job submission failed. Requested node configuration is not available. Each node has 100+ gb of

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread charlie hemlock
With the way the # of cpus are defined in the slurm.conf w/Procs, I'm wondering if CR_Core or CR_Cpu is more appropriate? See below. ( The slurm.conf is not setting the each node's Sockets, CoresPerSocket, and ThreadsPerCore.) Thank you all for your help and patience! *Option 1*:CR_Core #

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread Carlos Fenoy
I'm not completely sure what you should use, but If you don't plan to use hyperthreading you probably should use the CR_CPU Check the slurm.conf man page for further details On Wed, Nov 4, 2015 at 4:33 PM, charlie hemlock wrote: > With the way the # of cpus are

[slurm-dev] Best Practices in Managing Partitions dynamically

2015-11-04 Thread Kumar, Amit
Dear All, Want to learn how most of you mange partitions and changes to them do. For example: I have hundreds of nodes in a partition, and when time comes to shuffle a handful of them out of the partition and move it to another partition, I find editing the nodes entries in the partition a bit

[slurm-dev] Re: Best Practices in Managing Partitions dynamically

2015-11-04 Thread Trey Dockendorf
I used clustershell nodeset command [1] as a helper to 'folding' our node lists. In our case the output can be condensed further when passed to SLURM since some of our systems have 2 sets of digits in their name. $ nodeset -f c0611n1 c0611n2 c0612n1 c0612n2 c[0611-0612]n1,c[0611-0612]n2 SLURM:

[slurm-dev] Re: Need guidance to run multiple tasks per node with sbatch job array

2015-11-04 Thread Cooper, Trevor
Carlos, You are correct about Shared=NO vs. Shared=FORCE:1 and the use of the --exclusive flag. We use Shared=FORCE:1 on our shared partitions for this reason (shared means shared) and in fact go further to block multi-node submissions to shared partitions with QOS and whole node resource

[slurm-dev] Re: Best Practices in Managing Partitions dynamically

2015-11-04 Thread Trey Dockendorf
I learned something new, thanks! - Trey = Trey Dockendorf Systems Analyst I Texas A University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: treyd...@tamu.edu Jabber: treyd...@tamu.edu On Wed, Nov 4, 2015 at 12:09 PM,

[slurm-dev] Re: Best Practices in Managing Partitions dynamically

2015-11-04 Thread Bruce Roberts
You can always use scontrol show hostlist scontrol show hostlist c0611n1,c0611n2,c0612n1,c0612n2 c0611n[1-2],c0612n[1-2] or show hostnames for the opposite... scontrol show hostnames c0611n[1-2],c0612n[1-2] c0611n1 c0611n2 c0612n1 c0612n2 That should avoid the nodeset and order it the way