[slurm-dev] Re: single node workstation

Benjamin Redling Fri, 09 Sep 2016 01:51:30 -0700

Hi,

I think your case is mentioned in the FAQ Q30 in the "NOTE".
-- according to this you set CR_CPU and "CPU" only; no cores, no
threads, ...:


http://slurm.schedmd.com/faq.html
[...]
30.  Slurm documentation refers to CPUs, cores and threads. What exactly
is considered a CPU?
If your nodes are configured with hyperthreading, then a CPU is
equivalent to a hyperthread. Otherwise a CPU is equivalent to a core.
You can determine if your nodes have more than one thread per core using
the command "scontrol show node" and looking at the values of
"ThreadsPerCore".

Note that even on systems with hyperthreading enabled, the resources
will generally be allocated to jobs at the level of a core (see NOTE
below). Two different jobs will not share a core except through the use
of a partition OverSubscribe configuration parameter. For example, a job
requesting resources for three tasks on a node with ThreadsPerCore=2
will be allocated two full cores. Note that Slurm commands contain a
multitude of options to control resource allocation with respect to base
boards, sockets, cores and threads.

(NOTE: An exception to this would be if the system administrator
configured SelectTypeParameters=CR_CPU and each node's CPU count without
its socket/core/thread specification. In that case, each thread would be
independently scheduled as a CPU. This is not a typical configuration.)

Regards,
Benjamin

On 09/09/2016 02:06, andrealphus wrote:
> 
> p.s. same issue on v16
> 
> On Wed, Sep 7, 2016 at 9:57 AM, andrealphus <andrealp...@gmail.com> wrote:
>>
>> p.s. it's listing 36 processors with sinfo, and that theyre all being
>> used, but it only running 18 jobs. So it looks like while it can see
>> the 36 "processors" its only allocating on the core level and not the
>> thread level;
>>
>>  squeue
>>              JOBID PARTITION     NAME     USER ST       TIME  NODES
>> NODELIST(REASON)
>>  3850_[19-1000%25]     debug slurm_ex   ashton PD       0:00      1 
>> (Resources)
>>             3850_1     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_2     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_3     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_4     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_5     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_6     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_7     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_8     debug slurm_ex   ashton  R       0:05      1 localhost
>>             3850_9     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_10     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_11     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_12     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_13     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_14     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_15     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_16     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_17     debug slurm_ex   ashton  R       0:05      1 localhost
>>            3850_18     debug slurm_ex   ashton  R       0:05      1 localhost
>> sinfo -o %C
>> CPUS(A/I/O/T)
>> 36/0/0/36
>>
>> On Wed, Sep 7, 2016 at 9:41 AM, andrealphus <andrealp...@gmail.com> wrote:
>>>
>>> I tried changing the CPU flag int eh compute node section of the conf
>>> file to 36, but it didnt make a difference, still limited to 18. Also
>>> tried removing the flag and letting slurm calculate it from the other
>>> info, e.g.;
>>>  Sockets=1 CoresPerSocket=18 ThreadsPerCore=2
>>>
>>> also no change. Could it be a non configuration issue, e.g. a slurm
>>> bug related to the processor type? I only say that because I am
>>> normally a torque user, but there is an open bug with Adaptive that
>>> seems to be related to some of the newer intel
>>> processsors/glibc/elision locking....
>>>
>>>
>>> On Tue, Sep 6, 2016 at 7:30 PM, andrealphus <andrealp...@gmail.com> wrote:
>>>>
>>>> ahhhh......I'll give that a try. Thanks Lachlan, feel better!
>>>>
>>>> On Tue, Sep 6, 2016 at 6:49 PM, Lachlan Musicman <data...@gmail.com> wrote:
>>>>> No, sorry, I meant that your config file line needs to change:
>>>>>
>>>>>
>>>>> NodeName=localhost CPUs=36 RealMemory=120000 Sockets=1 CoresPerSocket=18
>>>>> ThreadsPerCore=2 State=UNKNOWN
>>>>>
>>>>> ------
>>>>> The most dangerous phrase in the language is, "We've always done it this
>>>>> way."
>>>>>
>>>>> - Grace Hopper
>>>>>
>>>>> On 7 September 2016 at 11:34, andrealphus <andrealp...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> Yup, thats what I expect too! Since Im brand new to slurm, not sure if
>>>>>> there is some other config option or srun flag to enable
>>>>>> multithreading
>>>>>>
>>>>>> On Tue, Sep 6, 2016 at 5:42 PM, Lachlan Musicman <data...@gmail.com>
>>>>>> wrote:
>>>>>>> Oh, I'm not 100% sure on this (home sick actually), but I think:
>>>>>>>
>>>>>>> NodeName=localhost CPUs=1 RealMemory=120000 Sockets=1 CoresPerSocket=18
>>>>>>> ThreadsPerCore=2 State=UNKNOWN
>>>>>>>
>>>>>>>
>>>>>>> should have CPUs=36 (ie, ThreadsperCore*CoresPerSocket*Sockets)
>>>>>>>
>>>>>>> cheers
>>>>>>> L,
>>>>>>>
>>>>>>> ------
>>>>>>> The most dangerous phrase in the language is, "We've always done it this
>>>>>>> way."
>>>>>>>
>>>>>>> - Grace Hopper
>>>>>>>
>>>>>>> On 7 September 2016 at 10:39, andrealphus <andrealp...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks Lachman, took threads-per-core and out same behavior, still
>>>>>>>> limited to 18.
>>>>>>>>
>>>>>>>> On Tue, Sep 6, 2016 at 5:33 PM, Lachlan Musicman <data...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> You don't need --threads-per-core.
>>>>>>>>>
>>>>>>>>> It's sufficient to have
>>>>>>>>>
>>>>>>>>> SelectType=select/cons_res
>>>>>>>>> SelectTypeParameters=CR_CPU
>>>>>>>>>
>>>>>>>>> then you should be able to get to all 36.
>>>>>>>>>
>>>>>>>>> cheers
>>>>>>>>> L.
>>>>>>>>>
>>>>>>>>> ------
>>>>>>>>> The most dangerous phrase in the language is, "We've always done it
>>>>>>>>> this
>>>>>>>>> way."
>>>>>>>>>
>>>>>>>>> - Grace Hopper
>>>>>>>>>
>>>>>>>>> On 7 September 2016 at 10:22, andrealphus <andrealp...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> one more follow up....
>>>>>>>>>>
>>>>>>>>>> This seems to limited to the number of cores. Anyway to change it so
>>>>>>>>>> that I can run up to the thread limit (18x2) concurrently?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 6, 2016 at 3:21 PM, andrealphus <andrealp...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> spoke too soon, so for posterity....
>>>>>>>>>>>
>>>>>>>>>>> need to set, in the conf;
>>>>>>>>>>> SelectType=select/con_res
>>>>>>>>>>> SelectTypeParameters=CR_CPU
>>>>>>>>>>>
>>>>>>>>>>> and in the script;
>>>>>>>>>>> #SBATCH --threads-per-core=1
>>>>>>>>>>>
>>>>>>>>>>> and DefMemPerCPU, did not matter...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 6, 2016 at 3:08 PM, andrealphus
>>>>>>>>>>> <andrealp...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> Long time Torque user, first time SLURM user. I'm running version
>>>>>>>>>>>> 15.08 from APT on Ubuntu Xenial. (running on an 18 core CPU
>>>>>>>>>>>> E5-2697
>>>>>>>>>>>> v4)
>>>>>>>>>>>>
>>>>>>>>>>>> I'm trying to figure out the proper slurm.conf configuration, and
>>>>>>>>>>>> script parameters to run a job array on a single node/server
>>>>>>>>>>>> workstation, with more than one concurrent task of the job
>>>>>>>>>>>> running
>>>>>>>>>>>> at
>>>>>>>>>>>> the time.
>>>>>>>>>>>>
>>>>>>>>>>>> e.g.
>>>>>>>>>>>>
>>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>>> #SBATCH -o slurm_example-%A_%a.out
>>>>>>>>>>>> #SBATCH --array=1-21%3
>>>>>>>>>>>> #SBATCH --mem-per-cpu=2000 f
>>>>>>>>>>>>
>>>>>>>>>>>> srun sleep 15
>>>>>>>>>>>>
>>>>>>>>>>>> and submitting with $sbatch exmaple.sh, should run 21 total
>>>>>>>>>>>> instances
>>>>>>>>>>>> of sleep, 3 at a time, correct?
>>>>>>>>>>>>
>>>>>>>>>>>> I can never get more than 1 concurrent process going....
>>>>>>>>>>>>
>>>>>>>>>>>> My slurm.conf file looks like;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ControlMachine=localhost
>>>>>>>>>>>> AuthType=auth/munge
>>>>>>>>>>>> CacheGroups=0
>>>>>>>>>>>> CryptoType=crypto/munge
>>>>>>>>>>>> MaxTasksPerNode=32
>>>>>>>>>>>> MpiDefault=none
>>>>>>>>>>>> ProctrackType=proctrack/pgid
>>>>>>>>>>>> ReturnToService=1
>>>>>>>>>>>> SlurmctldPidFile=/var/run/slurmctld.pid
>>>>>>>>>>>> SlurmctldPort=6817
>>>>>>>>>>>> SlurmdPidFile=/var/run/slurmd.pid
>>>>>>>>>>>> SlurmdPort=6818
>>>>>>>>>>>> SlurmdSpoolDir=/var/spool/slurmd
>>>>>>>>>>>> SlurmUser=root
>>>>>>>>>>>> StateSaveLocation=/var/spool
>>>>>>>>>>>> SwitchType=switch/none
>>>>>>>>>>>> TaskPlugin=task/none
>>>>>>>>>>>> InactiveLimit=0
>>>>>>>>>>>> KillWait=30
>>>>>>>>>>>> MinJobAge=300
>>>>>>>>>>>> SlurmctldTimeout=120
>>>>>>>>>>>> SlurmdTimeout=300
>>>>>>>>>>>> Waittime=0
>>>>>>>>>>>> FastSchedule=1
>>>>>>>>>>>> SchedulerType=sched/backfill
>>>>>>>>>>>> SchedulerPort=7321
>>>>>>>>>>>> SelectType=select/cons_res
>>>>>>>>>>>> SelectTypeParameters=CR_CPU
>>>>>>>>>>>> AccountingStorageType=accounting_storage/none
>>>>>>>>>>>> AccountingStoreJobComment=YES
>>>>>>>>>>>> ClusterName=cluster
>>>>>>>>>>>> JobCompType=jobcomp/none
>>>>>>>>>>>> JobAcctGatherFrequency=30
>>>>>>>>>>>> JobAcctGatherType=jobacct_gather/none
>>>>>>>>>>>> SlurmctldDebug=3
>>>>>>>>>>>> SlurmdDebug=3
>>>>>>>>>>>>
>>>>>>>>>>>> # COMPUTE NODES
>>>>>>>>>>>> NodeName=localhost CPUs=1 RealMemory=120000 Sockets=1
>>>>>>>>>>>> CoresPerSocket=18 ThreadsPerCore=2 State=UNKNOWN
>>>>>>>>>>>> PartitionName=debug Nodes=localhost Shared=YES DefMemPerCPU=3000
>>>>>>>>>>>> Default=YES MaxTime=INFINITE State=UP
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I've tried both;
>>>>>>>>>>>>
>>>>>>>>>>>> SelectType=select/cons_res
>>>>>>>>>>>> SelectTypeParameters=CR_CPU
>>>>>>>>>>>> and
>>>>>>>>>>>> SelectType=select/linear
>>>>>>>>>>>>
>>>>>>>>>>>> but both return;
>>>>>>>>>>>> sinfo -o %C
>>>>>>>>>>>> CPUS(A/I/O/T)
>>>>>>>>>>>> 0/0/1/1
>>>>>>>>>>>>
>>>>>>>>>>>> which didnt seem right, because I thought if I sent
>>>>>>>>>>>> SelectType=select/cons_res & SelectTypeParameters=CR_CPU, the
>>>>>>>>>>>> threads
>>>>>>>>>>>> should be seen as the CPUS
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I've tried to piece it together with the slurm and ubuntu mailing
>>>>>>>>>>>> list, but two days later am ready to hide in a corner....
>>>>>>>>>>>>
>>>>>>>>>>>> any info appreciated!
>>>>>>>>>>>>
>>>>>>>>>>>> ashton
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>

-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321

[slurm-dev] Re: single node workstation

Reply via email to