Re: [slurm-dev] slurm 2.2.7 and cons_res: sync loop not progressing

Nicolas Bigaouette Tue, 02 Aug 2011 16:42:08 -0700

As it was in the first email:
NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4
ThreadsPerCore=2 State=UNKNOWN



On Tue, Aug 2, 2011 at 7:34 PM, <[email protected]> wrote:

> I don't see socket/core/thread information in this slurm.conf. how exactly
> did you change them?
>
>
> Quoting Nicolas Bigaouette <[email protected]>:
>
>  Hi Danny,
>>
>> Yes of course... Here it is.
>>
>> N
>>
>> On Tue, Aug 2, 2011 at 6:52 PM, Danny Auble <[email protected]> wrote:
>>
>>  **
>>>
>>> Hey Nicolas, could you send your complete slurm.conf? It would be
>>> interesting to see the other plugins you are using that may be
>>> contributing
>>> to the problem.
>>>
>>>
>>>
>>> Danny
>>>
>>>
>>>
>>> On Tuesday August 02 2011 6:43:17 PM you wrote:
>>>
>>> > Hi all,
>>>
>>> >
>>>
>>> > I'm having issues with slurm 2.2.7 and specifying the nodes cpu
>>> information.
>>>
>>> >
>>>
>>> > If I set the number of sockets, core per socket and thread per core
>>> like
>>>
>>> > this:
>>>
>>> >
>>>
>>> > > NodeName=node[2-4] RealMemory=23000 Sockets=2 CoresPerSocket=4
>>>
>>> > > ThreadsPerCore=2 State=UNKNOWN
>>>
>>> > >
>>>
>>> > >> and submit a job, slurmctl crashes. The last section of
>>> sclurmctl.log
>>> is:
>>>
>>> >
>>>
>>> > > [2011-08-02T17:58:50] debug2: initial priority for job 49852 is 98
>>>
>>> > > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config
>>> containing
>>>
>>> > > node[2-4]
>>>
>>> > > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes
>>> 65
>>>
>>> > > share_nodes 76
>>>
>>> > > [2011-08-02T17:58:50] debug2: sched: JobId=49852 allocated resources:
>>>
>>> > > NodeList=(null)
>>>
>>> > > [2011-08-02T17:58:50] _slurm_rpc_submit_batch_job JobId=49852
>>> usec=1540
>>>
>>> > > [2011-08-02T17:58:50] debug: sched: Running job scheduler
>>>
>>> > > [2011-08-02T17:58:50] debug2: found 3 usable nodes from config
>>> containing
>>>
>>> > > node[2-4]
>>>
>>> > > [2011-08-02T17:58:50] debug3: _pick_best_nodes: job 49852 idle_nodes
>>> 65
>>>
>>> > > share_nodes 76
>>>
>>> > > [2011-08-02T17:58:50] fatal: cons_res: sync loop not progressing
>>>
>>> > >
>>>
>>> >
>>>
>>> >
>>>
>>> > I've also seen the error "cons_res: cpus computation error".
>>>
>>> >
>>>
>>> > There might be something wrong with my configuration, but slurm should
>>> tell
>>>
>>> > me so, not crash when a job is submitted...
>>>
>>> >
>>>
>>> > I'm playing with these options because a user reported that just using
>>>
>>> > Procs=16 would not spread his mpi processes accross the allocated
>>> nodes.
>>>
>>> > I've fixed that by using --nodes=*-* and --ntasks-per-node=*, but the
>>> crash
>>>
>>> > is still relevant I guess...
>>>
>>> >
>>>
>>> > Could it be a bug?
>>>
>>> >
>>>
>>> > Thanks
>>>
>>> >
>>>
>>> > Nicolas
>>>
>>>
>>
>
>
>

Re: [slurm-dev] slurm 2.2.7 and cons_res: sync loop not progressing

Reply via email to