p.s. same issue on v16
On Wed, Sep 7, 2016 at 9:57 AM, andrealphus <andrealp...@gmail.com> wrote: > > p.s. it's listing 36 processors with sinfo, and that theyre all being > used, but it only running 18 jobs. So it looks like while it can see > the 36 "processors" its only allocating on the core level and not the > thread level; > > squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 3850_[19-1000%25] debug slurm_ex ashton PD 0:00 1 > (Resources) > 3850_1 debug slurm_ex ashton R 0:05 1 localhost > 3850_2 debug slurm_ex ashton R 0:05 1 localhost > 3850_3 debug slurm_ex ashton R 0:05 1 localhost > 3850_4 debug slurm_ex ashton R 0:05 1 localhost > 3850_5 debug slurm_ex ashton R 0:05 1 localhost > 3850_6 debug slurm_ex ashton R 0:05 1 localhost > 3850_7 debug slurm_ex ashton R 0:05 1 localhost > 3850_8 debug slurm_ex ashton R 0:05 1 localhost > 3850_9 debug slurm_ex ashton R 0:05 1 localhost > 3850_10 debug slurm_ex ashton R 0:05 1 localhost > 3850_11 debug slurm_ex ashton R 0:05 1 localhost > 3850_12 debug slurm_ex ashton R 0:05 1 localhost > 3850_13 debug slurm_ex ashton R 0:05 1 localhost > 3850_14 debug slurm_ex ashton R 0:05 1 localhost > 3850_15 debug slurm_ex ashton R 0:05 1 localhost > 3850_16 debug slurm_ex ashton R 0:05 1 localhost > 3850_17 debug slurm_ex ashton R 0:05 1 localhost > 3850_18 debug slurm_ex ashton R 0:05 1 localhost > sinfo -o %C > CPUS(A/I/O/T) > 36/0/0/36 > > On Wed, Sep 7, 2016 at 9:41 AM, andrealphus <andrealp...@gmail.com> wrote: >> >> I tried changing the CPU flag int eh compute node section of the conf >> file to 36, but it didnt make a difference, still limited to 18. Also >> tried removing the flag and letting slurm calculate it from the other >> info, e.g.; >> Sockets=1 CoresPerSocket=18 ThreadsPerCore=2 >> >> also no change. Could it be a non configuration issue, e.g. a slurm >> bug related to the processor type? I only say that because I am >> normally a torque user, but there is an open bug with Adaptive that >> seems to be related to some of the newer intel >> processsors/glibc/elision locking.... >> >> >> On Tue, Sep 6, 2016 at 7:30 PM, andrealphus <andrealp...@gmail.com> wrote: >>> >>> ahhhh......I'll give that a try. Thanks Lachlan, feel better! >>> >>> On Tue, Sep 6, 2016 at 6:49 PM, Lachlan Musicman <data...@gmail.com> wrote: >>>> No, sorry, I meant that your config file line needs to change: >>>> >>>> >>>> NodeName=localhost CPUs=36 RealMemory=120000 Sockets=1 CoresPerSocket=18 >>>> ThreadsPerCore=2 State=UNKNOWN >>>> >>>> ------ >>>> The most dangerous phrase in the language is, "We've always done it this >>>> way." >>>> >>>> - Grace Hopper >>>> >>>> On 7 September 2016 at 11:34, andrealphus <andrealp...@gmail.com> wrote: >>>>> >>>>> >>>>> Yup, thats what I expect too! Since Im brand new to slurm, not sure if >>>>> there is some other config option or srun flag to enable >>>>> multithreading >>>>> >>>>> On Tue, Sep 6, 2016 at 5:42 PM, Lachlan Musicman <data...@gmail.com> >>>>> wrote: >>>>> > Oh, I'm not 100% sure on this (home sick actually), but I think: >>>>> > >>>>> > NodeName=localhost CPUs=1 RealMemory=120000 Sockets=1 CoresPerSocket=18 >>>>> > ThreadsPerCore=2 State=UNKNOWN >>>>> > >>>>> > >>>>> > should have CPUs=36 (ie, ThreadsperCore*CoresPerSocket*Sockets) >>>>> > >>>>> > cheers >>>>> > L, >>>>> > >>>>> > ------ >>>>> > The most dangerous phrase in the language is, "We've always done it this >>>>> > way." >>>>> > >>>>> > - Grace Hopper >>>>> > >>>>> > On 7 September 2016 at 10:39, andrealphus <andrealp...@gmail.com> wrote: >>>>> >> >>>>> >> >>>>> >> Thanks Lachman, took threads-per-core and out same behavior, still >>>>> >> limited to 18. >>>>> >> >>>>> >> On Tue, Sep 6, 2016 at 5:33 PM, Lachlan Musicman <data...@gmail.com> >>>>> >> wrote: >>>>> >> > You don't need --threads-per-core. >>>>> >> > >>>>> >> > It's sufficient to have >>>>> >> > >>>>> >> > SelectType=select/cons_res >>>>> >> > SelectTypeParameters=CR_CPU >>>>> >> > >>>>> >> > then you should be able to get to all 36. >>>>> >> > >>>>> >> > cheers >>>>> >> > L. >>>>> >> > >>>>> >> > ------ >>>>> >> > The most dangerous phrase in the language is, "We've always done it >>>>> >> > this >>>>> >> > way." >>>>> >> > >>>>> >> > - Grace Hopper >>>>> >> > >>>>> >> > On 7 September 2016 at 10:22, andrealphus <andrealp...@gmail.com> >>>>> >> > wrote: >>>>> >> >> >>>>> >> >> >>>>> >> >> one more follow up.... >>>>> >> >> >>>>> >> >> This seems to limited to the number of cores. Anyway to change it so >>>>> >> >> that I can run up to the thread limit (18x2) concurrently? >>>>> >> >> >>>>> >> >> Thanks! >>>>> >> >> >>>>> >> >> On Tue, Sep 6, 2016 at 3:21 PM, andrealphus <andrealp...@gmail.com> >>>>> >> >> wrote: >>>>> >> >> > >>>>> >> >> > spoke too soon, so for posterity.... >>>>> >> >> > >>>>> >> >> > need to set, in the conf; >>>>> >> >> > SelectType=select/con_res >>>>> >> >> > SelectTypeParameters=CR_CPU >>>>> >> >> > >>>>> >> >> > and in the script; >>>>> >> >> > #SBATCH --threads-per-core=1 >>>>> >> >> > >>>>> >> >> > and DefMemPerCPU, did not matter... >>>>> >> >> > >>>>> >> >> > >>>>> >> >> > >>>>> >> >> > On Tue, Sep 6, 2016 at 3:08 PM, andrealphus >>>>> >> >> > <andrealp...@gmail.com> >>>>> >> >> > wrote: >>>>> >> >> >> >>>>> >> >> >> Hi all, >>>>> >> >> >> >>>>> >> >> >> Long time Torque user, first time SLURM user. I'm running version >>>>> >> >> >> 15.08 from APT on Ubuntu Xenial. (running on an 18 core CPU >>>>> >> >> >> E5-2697 >>>>> >> >> >> v4) >>>>> >> >> >> >>>>> >> >> >> I'm trying to figure out the proper slurm.conf configuration, and >>>>> >> >> >> script parameters to run a job array on a single node/server >>>>> >> >> >> workstation, with more than one concurrent task of the job >>>>> >> >> >> running >>>>> >> >> >> at >>>>> >> >> >> the time. >>>>> >> >> >> >>>>> >> >> >> e.g. >>>>> >> >> >> >>>>> >> >> >> #!/bin/bash >>>>> >> >> >> #SBATCH -o slurm_example-%A_%a.out >>>>> >> >> >> #SBATCH --array=1-21%3 >>>>> >> >> >> #SBATCH --mem-per-cpu=2000 f >>>>> >> >> >> >>>>> >> >> >> srun sleep 15 >>>>> >> >> >> >>>>> >> >> >> and submitting with $sbatch exmaple.sh, should run 21 total >>>>> >> >> >> instances >>>>> >> >> >> of sleep, 3 at a time, correct? >>>>> >> >> >> >>>>> >> >> >> I can never get more than 1 concurrent process going.... >>>>> >> >> >> >>>>> >> >> >> My slurm.conf file looks like; >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> ControlMachine=localhost >>>>> >> >> >> AuthType=auth/munge >>>>> >> >> >> CacheGroups=0 >>>>> >> >> >> CryptoType=crypto/munge >>>>> >> >> >> MaxTasksPerNode=32 >>>>> >> >> >> MpiDefault=none >>>>> >> >> >> ProctrackType=proctrack/pgid >>>>> >> >> >> ReturnToService=1 >>>>> >> >> >> SlurmctldPidFile=/var/run/slurmctld.pid >>>>> >> >> >> SlurmctldPort=6817 >>>>> >> >> >> SlurmdPidFile=/var/run/slurmd.pid >>>>> >> >> >> SlurmdPort=6818 >>>>> >> >> >> SlurmdSpoolDir=/var/spool/slurmd >>>>> >> >> >> SlurmUser=root >>>>> >> >> >> StateSaveLocation=/var/spool >>>>> >> >> >> SwitchType=switch/none >>>>> >> >> >> TaskPlugin=task/none >>>>> >> >> >> InactiveLimit=0 >>>>> >> >> >> KillWait=30 >>>>> >> >> >> MinJobAge=300 >>>>> >> >> >> SlurmctldTimeout=120 >>>>> >> >> >> SlurmdTimeout=300 >>>>> >> >> >> Waittime=0 >>>>> >> >> >> FastSchedule=1 >>>>> >> >> >> SchedulerType=sched/backfill >>>>> >> >> >> SchedulerPort=7321 >>>>> >> >> >> SelectType=select/cons_res >>>>> >> >> >> SelectTypeParameters=CR_CPU >>>>> >> >> >> AccountingStorageType=accounting_storage/none >>>>> >> >> >> AccountingStoreJobComment=YES >>>>> >> >> >> ClusterName=cluster >>>>> >> >> >> JobCompType=jobcomp/none >>>>> >> >> >> JobAcctGatherFrequency=30 >>>>> >> >> >> JobAcctGatherType=jobacct_gather/none >>>>> >> >> >> SlurmctldDebug=3 >>>>> >> >> >> SlurmdDebug=3 >>>>> >> >> >> >>>>> >> >> >> # COMPUTE NODES >>>>> >> >> >> NodeName=localhost CPUs=1 RealMemory=120000 Sockets=1 >>>>> >> >> >> CoresPerSocket=18 ThreadsPerCore=2 State=UNKNOWN >>>>> >> >> >> PartitionName=debug Nodes=localhost Shared=YES DefMemPerCPU=3000 >>>>> >> >> >> Default=YES MaxTime=INFINITE State=UP >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> I've tried both; >>>>> >> >> >> >>>>> >> >> >> SelectType=select/cons_res >>>>> >> >> >> SelectTypeParameters=CR_CPU >>>>> >> >> >> and >>>>> >> >> >> SelectType=select/linear >>>>> >> >> >> >>>>> >> >> >> but both return; >>>>> >> >> >> sinfo -o %C >>>>> >> >> >> CPUS(A/I/O/T) >>>>> >> >> >> 0/0/1/1 >>>>> >> >> >> >>>>> >> >> >> which didnt seem right, because I thought if I sent >>>>> >> >> >> SelectType=select/cons_res & SelectTypeParameters=CR_CPU, the >>>>> >> >> >> threads >>>>> >> >> >> should be seen as the CPUS >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> I've tried to piece it together with the slurm and ubuntu mailing >>>>> >> >> >> list, but two days later am ready to hide in a corner.... >>>>> >> >> >> >>>>> >> >> >> any info appreciated! >>>>> >> >> >> >>>>> >> >> >> ashton >>>>> >> > >>>>> >> > >>>>> > >>>>> > >>>> >>>>