Hi Moe,

On Sat, Nov 5, 2011 at 5:14 AM, Moe Jette <[email protected]> wrote:
> Mark,
>
> Sorry, there was a bug introduced on 18 October for Cray and BlueGene
> systems that effects version 2.3.1 release only. The fix can be found here:
> https://github.com/SchedMD/slurm/commit/e76a0c9b82d7311c9839afe33fa2933d7fd143d6.patch

Ah, now that's something I should have tried - 2.3.0. Then I could
have done a bisection... I'll know for next time :)

No worries - I just tried it out from git
(26e93d9784b4b63bff95270563397780b67b74de) and it's all working again.

Thanks for that!

>
> Note that with SLURM v2.3 or higher multiple front-end nodes (where the
> batch scripts run) can be configured for improved fault-tolerance and
> performance.

Yeah, that's one of the features that we're looking forward to on upgarde.

Many thanks!
Mark Nelson.

>
> Moe Jette
> SchedMD LLC
>
> Quoting Mark Nelson <[email protected]>:
>
>> Hi there,
>>
>> I've got a test box that's been running SLURM 2.2.7, emulating a two
>> rack Blue Gene /P that I'm attempting to upgrade to SLURM 2.3.1 (to
>> test things before we do this upgrade on our actual Blue Gene). But
>> I'm having a bit of trouble getting things going. I didn't think I had
>> to make any changes to the slurm.conf file (attached) to bring up
>> 2.3.1, but when starting the daemons (all of which are on the same
>> box) slurmctld complains with the following when slurmd tries to
>> connect:
>>
>> slurmctld: error: Registration message from unknown node bgp000
>> slurmctld: error: _slurm_rpc_node_registration node=bgp000: Invalid
>> node name specified
>> slurmctld: debug:  Spawning registration agent for slurm-dev 1 hosts
>> slurmctld: error: Registration message from unknown node bgp000
>> slurmctld: error: _slurm_rpc_node_registration node=bgp000: Invalid
>> node name specified
>>
>> This seems to leave the midplanes in the UNKNOWN state, and I can't
>> get them out of this state using: scontrol update nodename=bgp000
>> state=idle. It responds with "slurm_update error: Invalid node state
>> specified"
>>
>> The RELEASE_NOTES mention the addition of front end node configuration
>> options so I tried adding the following to the slurm.conf file:
>> # FRONTEND NODES
>> #FrontendName=DEFAULT
>> FrontendName=slurm-dev FrontendAddr=slurm-dev
>>
>> This didn't help though - I still had the same errors as above with
>> the midplanes in the UNKNOWN state.
>>
>> Is there some other change that I have to make to the slurm.conf file
>> for the upgrade to SLURM 2.3?
>>
>> Many thanks!
>> Mark Nelson.
>>
>
>
>
>

Reply via email to