Hi all,

I'm having issues with slurm 2.2.7 and specifying the nodes cpu information.

If I set the number of sockets, core per socket and thread per core like
this:

> NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4
> ThreadsPerCore=2 State=UNKNOWN
>
>> and submit a job, slurmctl crashes. The last section of sclurmctl.log is:

> [2011-08-02T17:58:50] debug2: initial priority for job 49852 is 98
> [2011-08-02T17:58:50] debug2: found 3 usable nodes from config containing
> node[2-4]
> [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes 65
> share_nodes 76
> [2011-08-02T17:58:50] debug2: sched: JobId=49852 allocated resources:
> NodeList=(null)
> [2011-08-02T17:58:50] _slurm_rpc_submit_batch_job JobId=49852 usec=1540
> [2011-08-02T17:58:50] debug:  sched: Running job scheduler
> [2011-08-02T17:58:50] debug2: found 3 usable nodes from config containing
> node[2-4]
> [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes 65
> share_nodes 76
> [2011-08-02T17:58:50] fatal: cons_res: sync loop not progressing
>


I've also seen the error "cons_res: cpus computation error".

There might be something wrong with my configuration, but slurm should tell
me so, not crash when a job is submitted...

I'm playing with these options because a user reported that just using
Procs=16 would not spread his mpi processes accross the allocated nodes.
I've fixed that by using --nodes=*-* and --ntasks-per-node=*, but the crash
is still relevant I guess...

Could it be a bug?

Thanks

Nicolas

Reply via email to