[slurm-dev] Re: Jobstep distribution among nodes

Mehdi Denou Fri, 04 Apr 2014 05:13:26 -0700

Of course, -N 1 is wrong since you request more cpu than available on 1
node.
I didn't read your mail to the end sorry.


try with: -n 25 -m plane=20

On 04/04/2014 13:57, Joan Arbona wrote:
> Not working, just says that cannot use more nodes than requested:
>
> srun: error: Unable to create job step: More processors requested than
> permitted
>
> Thanks
>
> On 04/04/14 13:50, Mehdi Denou wrote:
>> Try with:
>> srun -N 1 -n 25
>>
>> On 04/04/2014 13:47, Joan Arbona wrote:
>>> Excuse me, I confused "Nodes" with "Tasks". When I wrote "Nodes" in the
>>> last e-mail I meant "tasks".
>>>
>>> Let me explain it again with an example:
>>>
>>> My cluster has 2 nodes with 20 processors/node. I want to allocate all
>>> 40 processors and both nodes in sbatch. Then I have to execute a jobstep
>>> with srun on a subset of 25 processors. I want SLURM to fill completely
>>> the maximum number of nodes: That is, using all 20 processors of the
>>> first node and 5 of the second one.
>>>
>>> If I execute an sbatch like this:
>>> #!/bin/bash
>>> [...]
>>> #SBATCH --nodes=2
>>> #SBATCH --ntasks=40
>>> srun -n25 hostname
>>>
>>> Does not work and executes 12 hostname on the first node and 13 on the
>>> second one, and should execute 20 hostname on the first one and 5 on the
>>> second one.
>>>
>>>
>>> Thanks and sorry for the confusion,
>>> Joan
>>>
>>>
>>>
>>> On 04/04/14 13:22, Mehdi Denou wrote:
>>>> It's a little bit confusing:
>>>>
>>>> When in sbatch I specify that I want to allocate 25 nodes and I execute
>>>>
>>>> So it means -N 25
>>>> For example if you want to allocate 40 nodes and then execute srun on 25:
>>>>
>>>> #!/bin/bash
>>>> #SBATCH -N 40
>>>>
>>>> srun -N 25 hostname
>>>>
>>>> -n is the number of task (the number of system process)
>>>> -N or --nodes is the number of nodes.
>>>>
>>>> If you don't specify -n it's set to 1 by default.
>>>>
>>>> On 04/04/2014 11:24, Joan Arbona wrote:
>>>>> Thanks for the answer. No luck anyway.
>>>>> When in sbatch I specify that I want to allocate 25 nodes and I execute
>>>>> srun without parameters it works. However, if I specify I want to
>>>>> allocate 40 nodes and then I execute srun selecting only 25 of them it
>>>>> does not work.
>>>>>
>>>>> That is:
>>>>>
>>>>> ---
>>>>>
>>>>> 1.
>>>>> #!/bin/bash
>>>>> [...]
>>>>> #SBATCH --nodes=2
>>>>> #SBATCH --ntasks=25
>>>>>
>>>>> srun hostname
>>>>>
>>>>> -> Works, but we don't want it because we need srun to select a subset
>>>>> of the requested nodes.
>>>>>
>>>>> ---
>>>>>
>>>>> 2.
>>>>> #!/bin/bash
>>>>> [...]
>>>>> #SBATCH --nodes=2
>>>>> #SBATCH --ntasks=40
>>>>>
>>>>> srun -n25 hostname
>>>>>
>>>>> -> Doesn't work. Executes half of the processes on the first node and
>>>>> the other half on the second. Also tried to remove --nodes=2.
>>>>>
>>>>> ---
>>>>>
>>>>> It seems that it's the way sbatch influences srun. Is there anyway to
>>>>> see which parameters does the sbatch call transfers to srun?
>>>>>
>>>>> Thanks,
>>>>> Joan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 04/04/14 10:54, Mehdi Denou wrote:
>>>>>> Hello,
>>>>>>
>>>>>> You should take a look at the parameter --mincpu
>>>>>>
>>>>>>
>>>>>> On 04/04/2014 10:22, Joan Arbona wrote:
>>>>>>> Hello all,
>>>>>>>
>>>>>>> We have a cluster with 40 nodes and 20 cores for node and we are trying
>>>>>>> to distribute jobsteps executed with sbatch "in blocks". That means we
>>>>>>> want to fill the maximum number of nodes and, if the number of tasks is
>>>>>>> not multiple of 20, to have only one node without all cores busy. For
>>>>>>> example, if we executed a task on 25 cores, we would have node 1 with
>>>>>>> all 20 cores reserved and node 2 with only 5 cores reserved.
>>>>>>>
>>>>>>> If we execute
>>>>>>> srun  -n25  -pthin hostname
>>>>>>> works fine and produces the following output:
>>>>>>>
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner118
>>>>>>> foner119
>>>>>>> foner119
>>>>>>> foner119
>>>>>>> foner119
>>>>>>> foner119
>>>>>>>
>>>>>>>
>>>>>>> However, when we execute this in a sbatch script it does not work at
>>>>>>> all. I have tried it with all possible configurations I know and with
>>>>>>> all useful parameters. Instead it executes 13 processes on the first
>>>>>>> node and 12 processes on the second node.
>>>>>>>
>>>>>>> This is our sbatch script:
>>>>>>> #!/bin/bash
>>>>>>> #SBATCH --job-name=prova_joan
>>>>>>> #SBATCH --partition=thin
>>>>>>> #SBATCH --output=WRFJobName-%j.out
>>>>>>> #SBATCH --error=WRFJobName-%j.err
>>>>>>> #SBATCH --nodes=2
>>>>>>> #SBATCH --ntasks=40
>>>>>>>
>>>>>>> srun  -n25 --exclusive  hostname &
>>>>>>>
>>>>>>> wait
>>>>>>>
>>>>>>> I have already tried to remove the --exclusive and the & without 
>>>>>>> success.
>>>>>>>
>>>>>>> To sum up, the question is: What's the way to group tasks of jobsteps so
>>>>>>> they fill as many nodes as possible with sbatch?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Joan
>>>>>>>
>>>>>>>
>>>>>>> PS: Attaching slurm.conf:
>>>>>>>
>>>>>>>
>>>>>>> ##################BEGIN SLURM.CONF#######################
>>>>>>> ClusterName=foner
>>>>>>> ControlMachine=foner1,foner2
>>>>>>> ControlAddr=slurm-server
>>>>>>> #BackupController=
>>>>>>> #BackupAddr=
>>>>>>> #
>>>>>>> SlurmUser=slurm
>>>>>>> #SlurmdUser=root
>>>>>>> SlurmctldPort=6817
>>>>>>> SlurmdPort=6818
>>>>>>> AuthType=auth/munge
>>>>>>> CryptoType=crypto/munge
>>>>>>> JobCredentialPrivateKey=/etc/slurm/private.key
>>>>>>> JobCredentialPublicCertificate=/etc/slurm/public.key
>>>>>>> StateSaveLocation=/SLURM
>>>>>>> SlurmdSpoolDir=/var/log/slurm/spool_slurmd/
>>>>>>> SwitchType=switch/none
>>>>>>> MpiDefault=none
>>>>>>> SlurmctldPidFile=/var/run/slurm/slurmctld.pid
>>>>>>> SlurmdPidFile=/var/run/slurmd.pid
>>>>>>> #ProctrackType=proctrack/pgid
>>>>>>> ProctrackType=proctrack/linuxproc
>>>>>>> TaskPlugin=task/affinity
>>>>>>> TaskPluginParam=Cpusets
>>>>>>> #PluginDir=
>>>>>>> CacheGroups=0
>>>>>>> #FirstJobId=
>>>>>>> ReturnToService=0
>>>>>>> #MaxJobCount=
>>>>>>> #PlugStackConfig=
>>>>>>> #PropagatePrioProcess=
>>>>>>> #PropagateResourceLimits=
>>>>>>> #PropagateResourceLimitsExcept=
>>>>>>> #Prolog=/data/scripts/prolog_ctld.sh
>>>>>>> #Prolog=
>>>>>>> Epilog=/data/scripts/epilog.sh
>>>>>>> #SrunProlog=
>>>>>>> #SrunEpilog=
>>>>>>> #TaskProlog=
>>>>>>> #TaskEpilog=
>>>>>>> #TaskPlugin=
>>>>>>> #TrackWCKey=no
>>>>>>> #TreeWidth=50
>>>>>>> #TmpFS=
>>>>>>> #UsePAM=
>>>>>>> #UsePAM=1
>>>>>>> #
>>>>>>> # TIMERS
>>>>>>> SlurmctldTimeout=300
>>>>>>> SlurmdTimeout=300
>>>>>>> InactiveLimit=0
>>>>>>> MinJobAge=300
>>>>>>> KillWait=30
>>>>>>> Waittime=0
>>>>>>> #
>>>>>>> # SCHEDULING
>>>>>>> SchedulerType=sched/backfill
>>>>>>> #SchedulerAuth=
>>>>>>> #SchedulerPort=
>>>>>>> #SchedulerRootFilter=
>>>>>>> #SelectType=select/linear
>>>>>>> SelectType=select/cons_res
>>>>>>> SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
>>>>>>> FastSchedule=1
>>>>>>> PriorityType=priority/multifactor
>>>>>>> #PriorityDecayHalfLife=14-0
>>>>>>> #PriorityUsageResetPeriod=14-0
>>>>>>> PriorityWeightFairshare=0
>>>>>>> PriorityWeightAge=0
>>>>>>> PriorityWeightPartition=0
>>>>>>> PriorityWeightJobSize=0
>>>>>>> PriorityWeightQOS=1000
>>>>>>> #PriorityMaxAge=1-0
>>>>>>> #
>>>>>>> # LOGGING
>>>>>>> SlurmctldDebug=5
>>>>>>> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>>>>>>> SlurmdDebug=5
>>>>>>> SlurmdLogFile=/var/log/slurm/slurmd.log
>>>>>>> JobCompType=jobcomp/none
>>>>>>> #JobCompLoc=
>>>>>>> #
>>>>>>> # ACCOUNTING
>>>>>>> #JobAcctGatherType=jobacct_gather/linux
>>>>>>> #JobAcctGatherFrequency=30
>>>>>>> #
>>>>>>> #AccountingStorageType=accounting_storage/slurmdbd
>>>>>>> ##AccountingStorageHost=slurm-server
>>>>>>> #AccountingStorageLoc=
>>>>>>> #AccountingStoragePass=
>>>>>>> #AccountingStorageUser=
>>>>>>> #
>>>>>>>
>>>>>>> AccountingStorageEnforce=qos
>>>>>>> AccountingStorageLoc=slurm_acct_db
>>>>>>> AccountingStorageType=accounting_storage/slurmdbd
>>>>>>> AccountingStoragePort=8544
>>>>>>> AccountingStorageUser=root
>>>>>>> #AccountingStoragePass=slurm
>>>>>>> AccountingStorageHost=slurm-server
>>>>>>> # ACCT_GATHER
>>>>>>> JobAcctGatherType=jobacct_gather/linux
>>>>>>> JobAcctGatherFrequency=60
>>>>>>> #AcctGatherEnergyType=acct_gather_energy/rapl
>>>>>>> #AcctGatherNodeFreq=30
>>>>>>>
>>>>>>> #Memoria
>>>>>>> #DefMemPerCPU=1024 # 1GB
>>>>>>> #MaxMemPerCPU=3072 # 3GB
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> # COMPUTE NODES
>>>>>>> NodeName=foner[11-14] Procs=20 RealMemory= 258126 Sockets=2
>>>>>>> CoresPerSocket=10 ThreadsPerCore=1 State=UNKNOWN
>>>>>>>
>>>>>>> NodeName=foner[101-142] CPUs=20 Sockets=2 CoresPerSocket=10
>>>>>>> ThreadsPerCore=1 RealMemory=64398 State=UNKNOWN
>>>>>>>
>>>>>>> PartitionName=thin Nodes=foner[103-142] Shared=NO PreemptMode=CANCEL
>>>>>>> State=UP MaxTime=4320 MinNodes=2
>>>>>>> PartitionName=thin_test Nodes=foner[101,102] Default=YES Shared=NO
>>>>>>> PreemptMode=CANCEL State=UP MaxTime=60 MaxNodes=1
>>>>>>> PartitionName=fat Nodes=foner[11-14] Shared=NO PreemptMode=CANCEL
>>>>>>> State=UP MaxTime=4320 MaxNodes=1
>>>>>>>
>>>>>>> ##################END SLURM.CONF#######################
>>>>>>>
>

-- 
---
Mehdi Denou
International HPC support
+336 45 57 66 56

[slurm-dev] Re: Jobstep distribution among nodes

Reply via email to