Hi Moe, On Sat, Nov 5, 2011 at 5:14 AM, Moe Jette <[email protected]> wrote: > Mark, > > Sorry, there was a bug introduced on 18 October for Cray and BlueGene > systems that effects version 2.3.1 release only. The fix can be found here: > https://github.com/SchedMD/slurm/commit/e76a0c9b82d7311c9839afe33fa2933d7fd143d6.patch
Ah, now that's something I should have tried - 2.3.0. Then I could have done a bisection... I'll know for next time :) No worries - I just tried it out from git (26e93d9784b4b63bff95270563397780b67b74de) and it's all working again. Thanks for that! > > Note that with SLURM v2.3 or higher multiple front-end nodes (where the > batch scripts run) can be configured for improved fault-tolerance and > performance. Yeah, that's one of the features that we're looking forward to on upgarde. Many thanks! Mark Nelson. > > Moe Jette > SchedMD LLC > > Quoting Mark Nelson <[email protected]>: > >> Hi there, >> >> I've got a test box that's been running SLURM 2.2.7, emulating a two >> rack Blue Gene /P that I'm attempting to upgrade to SLURM 2.3.1 (to >> test things before we do this upgrade on our actual Blue Gene). But >> I'm having a bit of trouble getting things going. I didn't think I had >> to make any changes to the slurm.conf file (attached) to bring up >> 2.3.1, but when starting the daemons (all of which are on the same >> box) slurmctld complains with the following when slurmd tries to >> connect: >> >> slurmctld: error: Registration message from unknown node bgp000 >> slurmctld: error: _slurm_rpc_node_registration node=bgp000: Invalid >> node name specified >> slurmctld: debug: Spawning registration agent for slurm-dev 1 hosts >> slurmctld: error: Registration message from unknown node bgp000 >> slurmctld: error: _slurm_rpc_node_registration node=bgp000: Invalid >> node name specified >> >> This seems to leave the midplanes in the UNKNOWN state, and I can't >> get them out of this state using: scontrol update nodename=bgp000 >> state=idle. It responds with "slurm_update error: Invalid node state >> specified" >> >> The RELEASE_NOTES mention the addition of front end node configuration >> options so I tried adding the following to the slurm.conf file: >> # FRONTEND NODES >> #FrontendName=DEFAULT >> FrontendName=slurm-dev FrontendAddr=slurm-dev >> >> This didn't help though - I still had the same errors as above with >> the midplanes in the UNKNOWN state. >> >> Is there some other change that I have to make to the slurm.conf file >> for the upgrade to SLURM 2.3? >> >> Many thanks! >> Mark Nelson. >> > > > >
