Hi Danny, That did indeed fix things - pointing SLURM to a new location for the state files let everything start up correctly. I really should have thought of that :)
Thanks! Mark On Sat, Oct 1, 2011 at 12:26 AM, Danny Auble <d...@schedmd.com> wrote: > Mark are you pointing to the old bgl/p state files? Just point the state > files elsewhere and things should work. > > Mark Nelson <mdnels...@gmail.com> wrote: >> >> Hi there, >> >> I'm trying to get SLURM 2.3 set up emulating a Blue Gene/Q. I tried >> using both the 2.3.0-2 release and commit >> 85354f56be838737c8c034a5c65820d757c33612 from git, but I keep getting >> the following error when trying to start up slurmctld: >> fatal: bluegene.conf starting coordinate is invalid: 000 >> >> I created the bluegene.conf file with smap, but perhaps there's >> something wrong with my slurm.conf file (it was originally from an >> emulated BG/P, just updated to look like a BG/Q one). Any help would >> be greatly appreciated. Attached are the slurm.conf and bluegene.conf >> files in question. >> >> And here's the full output of slurmctld (run with slurmctld -D -c -vvvv): >> slurmctld: pidfile not locked, assuming no running daemon >> slurmctld: error: Configured MailProg is invalid >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/accounting_storage_none.so >> slurmctld: Accounting storage NOT INVOKED plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug3: not enforcing associations and no list was given so >> we are giving a blank list >> slurmctld: debug3: Version in assoc_mgr_state header is 1 >> slurmctld: slurmctld version 2.3.0-2 started on cluster tambo >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/crypto_munge.so >> slurmctld: Munge cryptographic signature plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/select_bluegene.so >> slurmctld: BlueGene node selection plugin loading... >> slurmctld: debug: Setting dimensions from slurm.conf file >> slurmctld: Attempting to contact MMCS >> slurmctld: BlueGene configured with 1122 midplanes >> slurmctld: debug: We are using 1122 of the system. >> slurmctld: BlueGene plugin loaded successfully >> slurmctld: BlueGene node selection plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/preempt_none.so >> slurmctld: preempt/none loaded >> slurmctld: debug3: Success. >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/checkpoint_none.so >> slurmctld: debug3: Success. >> slurmctld: Checkpoint plugin loaded: checkpoint/none >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/jobacct_gather_none.so >> slurmctld: Job accounting gather NOT_INVOKED plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug: No backup controller to shutdown >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/switch_none.so >> slurmctld: switch NONE plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug3: Prefix is bgq bgq[0000x0011] 4 >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/topology_3d_torus.so >> slurmctld: topology 3d_torus plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug: No DownNodes >> slurmctld: debug2: partition main does not allow root jobs >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/jobcomp_script.so >> slurmctld: jobcomp/script plugin loaded init >> slurmctld: debug3: Success. >> slurmctld: debug3: Trying to load plugin >> /usr/local/slurm/2.3.0-2/lib/slurm/sched_builtin.so >> slurmctld: sched: Built-in scheduler plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug3: Version string in job_state header is VER010 >> slurmctld: debug: ************************************************* >> slurmctld: debug: Can not recover last job ID, incompatible version >> slurmctld: debug: ************************************************* >> slurmctld: debug3: adding mps: 0000x0001 >> slurmctld: debug3: slurm.conf: 1122 >> slurmctld: debug3: process_nodes: start is now 0000 >> slurmctld: debug3: process_nodes: start is 0000 0000 >> slurmctld: debug3: process_nodes: 0000 is included in this block >> slurmctld: debug3: process_nodes: 0001 is included in this block >> slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0000 >> slurmctld: debug3: adding mps: 0010x0011 >> slurmctld: debug3: slurm.conf: 1122 >> slurmctld: debug3: process_nodes: start is now 0010 >> slurmctld: debug3: process_nodes: start is 0010 0010 >> slurmctld: debug3: process_nodes: 0010 is included in this block >> slurmctld: debug3: process_nodes: 0011 is included in this block >> slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0010 >> slurmctld: debug: bluegene: select_p_state_restore >> slurmctld: debug3: Version string in block_state header is VER004 >> slurmctld: fatal: bluegene.conf starting coordinate is invalid: 000 >> >> Many thanks! >> Mark >