[slurm-dev] Re: Ressouces allocation problem

David Roman Mon, 01 Feb 2016 04:10:46 -0800

The both nodes are the same. They are virtual machine (VMWARE) to do some tests.


[root@slurm_node1 ~]#   lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Stepping:              4
CPU MHz:               2593.500
BogoMIPS:              5187.00
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-7



 

-----Message d'origine-----
De : Benjamin Redling [mailto:[email protected]] 
Envoyé : vendredi 29 janvier 2016 21:31
À : slurm-dev <[email protected]>
Objet : [slurm-dev] Re: Ressouces allocation problem


On 2016-01-29 17:04, David Roman wrote:
> My problem is simple. I have 2 nodes, each  with 8 cpus. I can use at the 
> same time a maximum of 16 cpus. In the first case Job_A use 8 cpus and Job_B 
> wait to use 16 cpus. But, in the other case, Job_B use 16 cpus, and Job_A use 
> 8 cpus in the same time. But 16+8 = 24 and it is great than 16 !

Can you cat /proc/cpuinfo -- I still think one of the nodes might not fit your 
configuration.

As I tried to explain: depending on your real hardware Fastschedule=0 will 
consider this and not your configuration and suddenly the sequence of job 
submission is relevant.

Anyway can you have a look via scontrol show -d job <jobidA> and scontrol show 
<jobidA> into details of both running jobs. For a quick glimpse.

After that you can try to raise SlurmdDebug on the compute node and Slurmctld 
on the master up to 9 and inspect SlurmdLogFile on the compute node and 
SlurmctldLogFile on the master. To really get _all_ the details of the job 
allocation.

Benjamin


> David
> 
> 
> De : Dennis Mungai [mailto:[email protected]]
> Envoyé : vendredi 29 janvier 2016 16:18 À : slurm-dev 
> <[email protected]> Objet : [slurm-dev] Re: Ressouces allocation 
> problem
> 
> 
> Can you change your consumable resources from CR_Core_Memory to CR_CPU_Memory?
> On Jan 29, 2016 5:42 PM, Benjamin Redling 
> <[email protected]<mailto:[email protected]>> wrote:
> 
> Am 29.01.2016 um 15:31 schrieb Dennis Mungai:
>> Add SHARE=FORCE to your partition settings for each partition entry 
>> in the configuration file.
> 
> https://computing.llnl.gov/linux/slurm/cons_res_share.html
> 
> selection setting was:
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> 
> Shared=FORCE as you recommend leads to:
> "
> Cores are allocated to jobs. A core may run more than one job.
> "
> 
> What does that have to do with the problem?
> Can you elaborate on that?
> 
> /Benjamin
> 
> 
>> On Jan 29, 2016 5:08 PM, David Roman 
>> <[email protected]<mailto:[email protected]>> wrote:
>> Hello,
>>
>> I'm a newbies with SLURM. Perhaps could you help me to understand my 
>> mistake.
>>
>> I have 2 nodes (2 sockets with 4 core per socket = 8 CPUs per node) I 
>> created 3 partitions
>>
>> DEV with node2
>> OP    with node1
>> LOW with node1 and node2
>>
>> I created 2 jobs
>> Job_A uses 8 CPUS in partion DEV
>> Job_B uses 16 CPUS in partion LOW
>>
>> If I start Job_A before Job_B, all is ok. Job_A is in RUNNING state 
>> and Job_B is in PENDING state
>>
>> BUT, If I start Job_B before Job_A. The both jobs are in RUNNING state.
>>
>> Thanks for your help,
>>
>> David.
>>
>>
>> Here my slurm.conf without comments
>>
>> ClusterName=Noveltits
>> ControlMachine=slurm
>> SlurmUser=slurm
>> SlurmctldPort=6817
>> SlurmdPort=6818
>> AuthType=auth/munge
>> StateSaveLocation=/tmp
>> SlurmdSpoolDir=/tmp/slurmd
>> SwitchType=switch/none
>> MpiDefault=none
>> SlurmctldPidFile=/var/run/slurmctld.pid
>> SlurmdPidFile=/var/run/slurmd.pid
>> ProctrackType=proctrack/pgid
>> CacheGroups=0
>> ReturnToService=0
>> SlurmctldTimeout=300
>> SlurmdTimeout=300
>> InactiveLimit=0
>> MinJobAge=300
>> KillWait=30
>> Waittime=0
>> SchedulerType=sched/backfill
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CORE_Memory
>> FastSchedule=0
>> SlurmctldDebug=3
>> SlurmdDebug=3
>> JobCompType=jobcomp/none
>>
>> PreemptMode=SUSPEND,GANG
>> PreemptType=preempt/partition_prio
>>
>>
>> NodeName=slurm_node[1-2] CPUs=8 SocketsPerBoard=2 CoresPerSocket=4
>> ThreadsPerCore=1
>> PartitionName=op  Nodes=slurm_node1     Priority=100 Default=No
>> MaxTime=INFINITE State=UP
>> PartitionName=dev Nodes=slurm_node2     Priority=1   Default=yes
>> MaxTime=INFINITE State=UP PreemptMode=OFF
>> PartitionName=low Nodes=slurm_node[1-2] Priority=1   Default=No
>> MaxTime=INFINITE State=UP
>>
>>
>> _____________________________________________________________________
>> _
>>
>> This e-mail contains information which is confidential. It is 
>> intended only for the use of the named recipient. If you have 
>> received this e-mail in error, please let us know by replying to the 
>> sender, and immediately delete it from your system. Please note, that 
>> in these circumstances, the use, disclosure, distribution or copying 
>> of this information is strictly prohibited. KEMRI-Wellcome Trust 
>> Programme cannot accept any responsibility for the accuracy or 
>> completeness of this message as it has been transmitted over a public 
>> network. Although the Programme has taken reasonable precautions to 
>> ensure no viruses are present in emails, it cannot accept 
>> responsibility for any loss or damage arising from the use of the 
>> email or attachments. Any views expressed in this message are those 
>> of the individual sender, except where the sender specifically states 
>> them to be the views of KEMRI-Wellcome Trust Programme.
>> _____________________________________________________________________
>> _
> 
> --
> FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
> vox: +49 3641 9 44323 | fax: +49 3641 9 44321
> 
> ______________________________________________________________________
> 
> This e-mail contains information which is confidential. It is intended only 
> for the use of the named recipient. If you have received this e-mail in 
> error, please let us know by replying to the sender, and immediately delete 
> it from your system. Please note, that in these circumstances, the use, 
> disclosure, distribution or copying of this information is strictly 
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility 
> for the accuracy or completeness of this message as it has been transmitted 
> over a public network. Although the Programme has taken reasonable 
> precautions to ensure no viruses are present in emails, it cannot accept 
> responsibility for any loss or damage arising from the use of the email or 
> attachments. Any views expressed in this message are those of the individual 
> sender, except where the sender specifically states them to be the views of 
> KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
> 


--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321

[slurm-dev] Re: Ressouces allocation problem

Reply via email to