Hi there,

I'm trying to get SLURM 2.3 set up emulating a Blue Gene/Q. I tried
using both the 2.3.0-2 release and commit
85354f56be838737c8c034a5c65820d757c33612 from git, but I keep getting
the following error when trying to start up slurmctld:
fatal: bluegene.conf starting coordinate is invalid: 000

I created the bluegene.conf file with smap, but perhaps there's
something wrong with my slurm.conf file (it was originally from an
emulated BG/P, just updated to look like a BG/Q one). Any help would
be greatly appreciated. Attached are the slurm.conf and bluegene.conf
files in question.

And here's the full output of slurmctld (run with slurmctld -D -c -vvvv):
slurmctld: pidfile not locked, assuming no running daemon
slurmctld: error: Configured MailProg is invalid
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/accounting_storage_none.so
slurmctld: Accounting storage NOT INVOKED plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: not enforcing associations and no list was given so
we are giving a blank list
slurmctld: debug3: Version in assoc_mgr_state header is 1
slurmctld: slurmctld version 2.3.0-2 started on cluster tambo
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/crypto_munge.so
slurmctld: Munge cryptographic signature plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/select_bluegene.so
slurmctld: BlueGene node selection plugin loading...
slurmctld: debug:  Setting dimensions from slurm.conf file
slurmctld: Attempting to contact MMCS
slurmctld: BlueGene configured with 1122 midplanes
slurmctld: debug:  We are using 1122 of the system.
slurmctld: BlueGene plugin loaded successfully
slurmctld: BlueGene node selection plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/preempt_none.so
slurmctld: preempt/none loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/checkpoint_none.so
slurmctld: debug3: Success.
slurmctld: Checkpoint plugin loaded: checkpoint/none
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/jobacct_gather_none.so
slurmctld: Job accounting gather NOT_INVOKED plugin loaded
slurmctld: debug3: Success.
slurmctld: debug:  No backup controller to shutdown
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/switch_none.so
slurmctld: switch NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Prefix is bgq bgq[0000x0011] 4
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/topology_3d_torus.so
slurmctld: topology 3d_torus plugin loaded
slurmctld: debug3: Success.
slurmctld: debug:  No DownNodes
slurmctld: debug2: partition main does not allow root jobs
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/jobcomp_script.so
slurmctld: jobcomp/script plugin loaded init
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/slurm/2.3.0-2/lib/slurm/sched_builtin.so
slurmctld: sched: Built-in scheduler plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Version string in job_state header is VER010
slurmctld: debug:  *************************************************
slurmctld: debug:  Can not recover last job ID, incompatible version
slurmctld: debug:  *************************************************
slurmctld: debug3: adding mps: 0000x0001
slurmctld: debug3: slurm.conf:    1122
slurmctld: debug3: process_nodes: start is now 0000
slurmctld: debug3: process_nodes: start is 0000 0000
slurmctld: debug3: process_nodes: 0000 is included in this block
slurmctld: debug3: process_nodes: 0001 is included in this block
slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0000
slurmctld: debug3: adding mps: 0010x0011
slurmctld: debug3: slurm.conf:    1122
slurmctld: debug3: process_nodes: start is now 0010
slurmctld: debug3: process_nodes: start is 0010 0010
slurmctld: debug3: process_nodes: 0010 is included in this block
slurmctld: debug3: process_nodes: 0011 is included in this block
slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0010
slurmctld: debug:  bluegene: select_p_state_restore
slurmctld: debug3: Version string in block_state header is VER004
slurmctld: fatal: bluegene.conf starting coordinate is invalid: 000

Many thanks!
Mark

Attachment: bluegene.conf
Description: Binary data

Attachment: slurm.conf
Description: Binary data

Reply via email to