Hi Danny,

That did indeed fix things - pointing SLURM to a new location for the
state files let everything start up correctly. I really should have
thought of that :)

Thanks!
Mark

On Sat, Oct 1, 2011 at 12:26 AM, Danny Auble <d...@schedmd.com> wrote:
> Mark are you pointing to the old bgl/p state files? Just point the state
> files elsewhere and things should work.
>
> Mark Nelson <mdnels...@gmail.com> wrote:
>>
>> Hi there,
>>
>> I'm trying to get SLURM 2.3 set up emulating a Blue Gene/Q. I tried
>> using both the 2.3.0-2 release and commit
>> 85354f56be838737c8c034a5c65820d757c33612 from git, but I keep getting
>> the following error when trying to start up slurmctld:
>> fatal: bluegene.conf starting coordinate is invalid: 000
>>
>> I created the bluegene.conf file with smap, but perhaps there's
>> something wrong with my slurm.conf file (it was originally from an
>> emulated BG/P, just updated to look like a BG/Q one). Any help would
>> be greatly appreciated. Attached are the slurm.conf and bluegene.conf
>> files in question.
>>
>> And here's the full output of slurmctld (run with slurmctld -D -c -vvvv):
>> slurmctld: pidfile not locked, assuming no running daemon
>> slurmctld: error: Configured MailProg is invalid
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/accounting_storage_none.so
>> slurmctld: Accounting storage NOT INVOKED plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: not enforcing associations and no list was given so
>> we are giving a blank list
>> slurmctld: debug3: Version in assoc_mgr_state header is 1
>> slurmctld: slurmctld version 2.3.0-2 started on cluster tambo
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/crypto_munge.so
>> slurmctld: Munge cryptographic signature plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/select_bluegene.so
>> slurmctld: BlueGene node selection plugin loading...
>> slurmctld: debug:  Setting dimensions from slurm.conf file
>> slurmctld: Attempting to contact MMCS
>> slurmctld: BlueGene configured with 1122 midplanes
>> slurmctld: debug:  We are using 1122 of the system.
>> slurmctld: BlueGene plugin loaded successfully
>> slurmctld: BlueGene node selection plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/preempt_none.so
>> slurmctld: preempt/none loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/checkpoint_none.so
>> slurmctld: debug3: Success.
>> slurmctld: Checkpoint plugin loaded: checkpoint/none
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/jobacct_gather_none.so
>> slurmctld: Job accounting gather NOT_INVOKED plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug:  No backup controller to shutdown
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/switch_none.so
>> slurmctld: switch NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Prefix is bgq bgq[0000x0011] 4
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/topology_3d_torus.so
>> slurmctld: topology 3d_torus plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug:  No DownNodes
>> slurmctld: debug2: partition main does not allow root jobs
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/jobcomp_script.so
>> slurmctld: jobcomp/script plugin loaded init
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/local/slurm/2.3.0-2/lib/slurm/sched_builtin.so
>> slurmctld: sched: Built-in scheduler plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Version string in job_state header is VER010
>> slurmctld: debug:  *************************************************
>> slurmctld: debug:  Can not recover last job ID, incompatible version
>> slurmctld: debug:  *************************************************
>> slurmctld: debug3: adding mps: 0000x0001
>> slurmctld: debug3: slurm.conf:    1122
>> slurmctld: debug3: process_nodes: start is now 0000
>> slurmctld: debug3: process_nodes: start is 0000 0000
>> slurmctld: debug3: process_nodes: 0000 is included in this block
>> slurmctld: debug3: process_nodes: 0001 is included in this block
>> slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0000
>> slurmctld: debug3: adding mps: 0010x0011
>> slurmctld: debug3: slurm.conf:    1122
>> slurmctld: debug3: process_nodes: start is now 0010
>> slurmctld: debug3: process_nodes: start is 0010 0010
>> slurmctld: debug3: process_nodes: 0010 is included in this block
>> slurmctld: debug3: process_nodes: 0011 is included in this block
>> slurmctld: debug3: process_nodes: geo = 1112 mp count is 2 start is 0010
>> slurmctld: debug:  bluegene: select_p_state_restore
>> slurmctld: debug3: Version string in block_state header is VER004
>> slurmctld: fatal: bluegene.conf starting coordinate is invalid: 000
>>
>> Many thanks!
>> Mark
>

Reply via email to