No, how did you change the values?
Did you update slurm.conf for all nodes or job some nodes?
Did you restart slurmctld or run scontrol reconfigure?

Quoting Nicolas Bigaouette <[email protected]>:

As it was in the first email:
NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4
ThreadsPerCore=2 State=UNKNOWN


On Tue, Aug 2, 2011 at 7:34 PM, <[email protected]> wrote:

I don't see socket/core/thread information in this slurm.conf. how exactly
did you change them?


Quoting Nicolas Bigaouette <[email protected]>:

 Hi Danny,

Yes of course... Here it is.

N

On Tue, Aug 2, 2011 at 6:52 PM, Danny Auble <[email protected]> wrote:

 **

Hey Nicolas, could you send your complete slurm.conf? It would be
interesting to see the other plugins you are using that may be
contributing
to the problem.



Danny



On Tuesday August 02 2011 6:43:17 PM you wrote:

> Hi all,

>

> I'm having issues with slurm 2.2.7 and specifying the nodes cpu
information.

>

> If I set the number of sockets, core per socket and thread per core
like

> this:

>

> > NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4

> > ThreadsPerCore=2 State=UNKNOWN

> >

> >> and submit a job, slurmctl crashes. The last section of
sclurmctl.log
is:

>

> > [2011-08-02T17:58:50] debug2: initial priority for job 49852 is 98

> > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config
containing

> > node[2-4]

> > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes
65

> > share_nodes 76

> > [2011-08-02T17:58:50] debug2: sched: JobId=49852 allocated resources:

> > NodeList=(null)

> > [2011-08-02T17:58:50] _slurm_rpc_submit_batch_job JobId=49852
usec=1540

> > [2011-08-02T17:58:50] debug: sched: Running job scheduler

> > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config
containing

> > node[2-4]

> > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes
65

> > share_nodes 76

> > [2011-08-02T17:58:50] fatal: cons_res: sync loop not progressing

> >

>

>

> I've also seen the error "cons_res: cpus computation error".

>

> There might be something wrong with my configuration, but slurm should
tell

> me so, not crash when a job is submitted...

>

> I'm playing with these options because a user reported that just using

> Procs=16 would not spread his mpi processes accross the allocated
nodes.

> I've fixed that by using --nodes=*-* and --ntasks-per-node=*, but the
crash

> is still relevant I guess...

>

> Could it be a bug?

>

> Thanks

>

> Nicolas











Reply via email to