Hi there, I'm trying to get SLURM 2.3 set up emulating a Blue Gene/Q. I tried using both the 2.3.0-2 release and commit 85354f56be838737c8c034a5c65820d757c33612 from git, but I keep getting the following error when trying to start up slurmctld: fatal: bluegene.conf starting coordinate is invalid: 000
I created the bluegene.conf file with smap, but perhaps there's something wrong with my slurm.conf file (it was originally from an emulated BG/P, just updated to look like a BG/Q one). Any help would be greatly appreciated. Attached are the slurm.conf and bluegene.conf files in question. And here's the full output of slurmctld (run with slurmctld -D -c -vvvv): slurmctld: pidfile not locked, assuming no running daemon slurmctld: error: Configured MailProg is invalid slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/accounting_storage_none.so slurmctld: Accounting storage NOT INVOKED plugin loaded slurmctld: debug3: Success. slurmctld: debug3: not enforcing associations and no list was given so we are giving a blank list slurmctld: debug3: Version in assoc_mgr_state header is 1 slurmctld: slurmctld version 2.3.0-2 started on cluster tambo slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/crypto_munge.so slurmctld: Munge cryptographic signature plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/select_bluegene.so slurmctld: BlueGene node selection plugin loading... slurmctld: debug: Setting dimensions from slurm.conf file slurmctld: Attempting to contact MMCS slurmctld: BlueGene configured with 1122 midplanes slurmctld: debug: We are using 1122 of the system. slurmctld: BlueGene plugin loaded successfully slurmctld: BlueGene node selection plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/preempt_none.so slurmctld: preempt/none loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/checkpoint_none.so slurmctld: debug3: Success. slurmctld: Checkpoint plugin loaded: checkpoint/none slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/jobacct_gather_none.so slurmctld: Job accounting gather NOT_INVOKED plugin loaded slurmctld: debug3: Success. slurmctld: debug: No backup controller to shutdown slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/switch_none.so slurmctld: switch NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Prefix is bgq bgq[0000x0011] 4 slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/topology_3d_torus.so slurmctld: topology 3d_torus plugin loaded slurmctld: debug3: Success. slurmctld: debug: No DownNodes slurmctld: debug2: partition main does not allow root jobs slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/jobcomp_script.so slurmctld: jobcomp/script plugin loaded init slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/slurm/2.3.0-2/lib/slurm/sched_builtin.so slurmctld: sched: Built-in scheduler plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Version string in job_state header is VER010 slurmctld: debug: ************************************************* slurmctld: debug: Can not recover last job ID, incompatible version slurmctld: debug: ************************************************* slurmctld: debug3: adding mps: 0000x0001 slurmctld: debug3: slurm.conf: 1122 slurmctld: debug3: process_nodes: start is now 0000 slurmctld: debug3: process_nodes: start is 0000 0000 slurmctld: debug3: process_nodes: 0000 is included in this block slurmctld: debug3: process_nodes: 0001 is included in this block slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0000 slurmctld: debug3: adding mps: 0010x0011 slurmctld: debug3: slurm.conf: 1122 slurmctld: debug3: process_nodes: start is now 0010 slurmctld: debug3: process_nodes: start is 0010 0010 slurmctld: debug3: process_nodes: 0010 is included in this block slurmctld: debug3: process_nodes: 0011 is included in this block slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0010 slurmctld: debug: bluegene: select_p_state_restore slurmctld: debug3: Version string in block_state header is VER004 slurmctld: fatal: bluegene.conf starting coordinate is invalid: 000 Many thanks! Mark
bluegene.conf
Description: Binary data
slurm.conf
Description: Binary data