Oh, sorry. Normally I copy slurm.conf to the nodes, then restart slurmd via the init script. I do not call scontrol. The slurmd process is terminated and restarted.
I don't remember if I did that procedure when experiencing the bug. Now trying to reproduce it I can't... I just modified slurm.conf on the master to put back the "Sockets=2 CoresPerSocket=4 ThreadsPerCore=2" for certain nodes, create a new partition for them, restart slurmctl and submit a job. Now it doesn't crash anymore... I know I have tried CR_Core, so maybe the nodes had CR_CPU while master had CR_Core. I just tried it too but it's the same: even if I don't copy+restart on the compute nodes, no crash. This is weird. I'll keep an eye open on that. Thanks for your replies. Nicolas On Tue, Aug 2, 2011 at 7:43 PM, <[email protected]> wrote: > No, how did you change the values? > Did you update slurm.conf for all nodes or job some nodes? > Did you restart slurmctld or run scontrol reconfigure? > > > Quoting Nicolas Bigaouette <[email protected]>: > > As it was in the first email: >> NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4 >> ThreadsPerCore=2 State=UNKNOWN >> >> >> On Tue, Aug 2, 2011 at 7:34 PM, <[email protected]> wrote: >> >> I don't see socket/core/thread information in this slurm.conf. how >>> exactly >>> did you change them? >>> >>> >>> Quoting Nicolas Bigaouette <[email protected]>: >>> >>> Hi Danny, >>> >>>> >>>> Yes of course... Here it is. >>>> >>>> N >>>> >>>> On Tue, Aug 2, 2011 at 6:52 PM, Danny Auble <[email protected]> wrote: >>>> >>>> ** >>>> >>>>> >>>>> Hey Nicolas, could you send your complete slurm.conf? It would be >>>>> interesting to see the other plugins you are using that may be >>>>> contributing >>>>> to the problem. >>>>> >>>>> >>>>> >>>>> Danny >>>>> >>>>> >>>>> >>>>> On Tuesday August 02 2011 6:43:17 PM you wrote: >>>>> >>>>> > Hi all, >>>>> >>>>> > >>>>> >>>>> > I'm having issues with slurm 2.2.7 and specifying the nodes cpu >>>>> information. >>>>> >>>>> > >>>>> >>>>> > If I set the number of sockets, core per socket and thread per core >>>>> like >>>>> >>>>> > this: >>>>> >>>>> > >>>>> >>>>> > > NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4 >>>>> >>>>> > > ThreadsPerCore=2 State=UNKNOWN >>>>> >>>>> > > >>>>> >>>>> > >> and submit a job, slurmctl crashes. The last section of >>>>> sclurmctl.log >>>>> is: >>>>> >>>>> > >>>>> >>>>> > > [2011-08-02T17:58:50] debug2: initial priority for job 49852 is 98 >>>>> >>>>> > > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config >>>>> containing >>>>> >>>>> > > node[2-4] >>>>> >>>>> > > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 >>>>> idle_nodes >>>>> 65 >>>>> >>>>> > > share_nodes 76 >>>>> >>>>> > > [2011-08-02T17:58:50] debug2: sched: JobId=49852 allocated >>>>> resources: >>>>> >>>>> > > NodeList=(null) >>>>> >>>>> > > [2011-08-02T17:58:50] _slurm_rpc_submit_batch_job JobId=49852 >>>>> usec=1540 >>>>> >>>>> > > [2011-08-02T17:58:50] debug: sched: Running job scheduler >>>>> >>>>> > > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config >>>>> containing >>>>> >>>>> > > node[2-4] >>>>> >>>>> > > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 >>>>> idle_nodes >>>>> 65 >>>>> >>>>> > > share_nodes 76 >>>>> >>>>> > > [2011-08-02T17:58:50] fatal: cons_res: sync loop not progressing >>>>> >>>>> > > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > I've also seen the error "cons_res: cpus computation error". >>>>> >>>>> > >>>>> >>>>> > There might be something wrong with my configuration, but slurm >>>>> should >>>>> tell >>>>> >>>>> > me so, not crash when a job is submitted... >>>>> >>>>> > >>>>> >>>>> > I'm playing with these options because a user reported that just >>>>> using >>>>> >>>>> > Procs=16 would not spread his mpi processes accross the allocated >>>>> nodes. >>>>> >>>>> > I've fixed that by using --nodes=*-* and --ntasks-per-node=*, but the >>>>> crash >>>>> >>>>> > is still relevant I guess... >>>>> >>>>> > >>>>> >>>>> > Could it be a bug? >>>>> >>>>> > >>>>> >>>>> > Thanks >>>>> >>>>> > >>>>> >>>>> > Nicolas >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > > >
