As it was in the first email: NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN
On Tue, Aug 2, 2011 at 7:34 PM, <[email protected]> wrote: > I don't see socket/core/thread information in this slurm.conf. how exactly > did you change them? > > > Quoting Nicolas Bigaouette <[email protected]>: > > Hi Danny, >> >> Yes of course... Here it is. >> >> N >> >> On Tue, Aug 2, 2011 at 6:52 PM, Danny Auble <[email protected]> wrote: >> >> ** >>> >>> Hey Nicolas, could you send your complete slurm.conf? It would be >>> interesting to see the other plugins you are using that may be >>> contributing >>> to the problem. >>> >>> >>> >>> Danny >>> >>> >>> >>> On Tuesday August 02 2011 6:43:17 PM you wrote: >>> >>> > Hi all, >>> >>> > >>> >>> > I'm having issues with slurm 2.2.7 and specifying the nodes cpu >>> information. >>> >>> > >>> >>> > If I set the number of sockets, core per socket and thread per core >>> like >>> >>> > this: >>> >>> > >>> >>> > > NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4 >>> >>> > > ThreadsPerCore=2 State=UNKNOWN >>> >>> > > >>> >>> > >> and submit a job, slurmctl crashes. The last section of >>> sclurmctl.log >>> is: >>> >>> > >>> >>> > > [2011-08-02T17:58:50] debug2: initial priority for job 49852 is 98 >>> >>> > > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config >>> containing >>> >>> > > node[2-4] >>> >>> > > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes >>> 65 >>> >>> > > share_nodes 76 >>> >>> > > [2011-08-02T17:58:50] debug2: sched: JobId=49852 allocated resources: >>> >>> > > NodeList=(null) >>> >>> > > [2011-08-02T17:58:50] _slurm_rpc_submit_batch_job JobId=49852 >>> usec=1540 >>> >>> > > [2011-08-02T17:58:50] debug: sched: Running job scheduler >>> >>> > > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config >>> containing >>> >>> > > node[2-4] >>> >>> > > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes >>> 65 >>> >>> > > share_nodes 76 >>> >>> > > [2011-08-02T17:58:50] fatal: cons_res: sync loop not progressing >>> >>> > > >>> >>> > >>> >>> > >>> >>> > I've also seen the error "cons_res: cpus computation error". >>> >>> > >>> >>> > There might be something wrong with my configuration, but slurm should >>> tell >>> >>> > me so, not crash when a job is submitted... >>> >>> > >>> >>> > I'm playing with these options because a user reported that just using >>> >>> > Procs=16 would not spread his mpi processes accross the allocated >>> nodes. >>> >>> > I've fixed that by using --nodes=*-* and --ntasks-per-node=*, but the >>> crash >>> >>> > is still relevant I guess... >>> >>> > >>> >>> > Could it be a bug? >>> >>> > >>> >>> > Thanks >>> >>> > >>> >>> > Nicolas >>> >>> >> > > >
