Hey guys -

We were about to start testing out 2.3.0-rc2 on our 1-rack BG/L @ RPI, but have not been able to launch slurmctld.

I've poked around and haven't found an obvious cause yet, although I can see that the block creation code has been changed a decent amount compared to 2.2 to make room for BG/Q.

The crash is:

slurmctld: Record: BlockID:RMP30Au165517092 Nodes:bp000[2] Conn:Small
slurmctld: debug2: adding block
slurmctld: debug2: done adding
slurmctld: Record: BlockID:RMP30Au165517102 Nodes:bp000[1] Conn:Small
slurmctld: debug2: adding block
slurmctld: debug2: done adding
slurmctld: Record: BlockID:RMP30Au165517115 Nodes:bp000[0] Conn:Small
slurmctld: error: bridge_set_data(RM_PartitionBlrtsImg): Invalid input
slurmctld: error: bridge_set_data(RM_PartitionLinuxImg): Invalid input
slurmctld: error: bridge_set_data(RM_PartitionRamdiskImg): Invalid input
slurmctld: error: bridge_set_data(RM_PartitionMloaderImg): Invalid input
slurmctld: error: Requesting small block with 0 mps, needs to be 1.
slurmctld: fatal: Error, could not create the static blocks

-CLI INVALID HANDLE-----
  cliRC = -2
  line  = 242
  file  = TxObject.cc
slurmctld: error: bridge_get_block_info(RMP08Fe113120123): Internal error
Segmentation fault


The full debug output is
http://scorec.rpi.edu/~wickbt/slurmctld-crash-2.3.0b2

Our slurm.conf is http://scorec.rpi.edu/~wickbt/slurm.conf
Our bluegene.conf is http://scorec.rpi.edu/~wickbt/bluegene.conf

As an added challenge, it does *not* crash under the BG/L emulation mode... I suspect this narrows it down to some potential mishandling of the bg_record struct before the call in to _pre_allocate() ?

Any ideas?

thanks,
- Tim

--
Tim Wickberg
[email protected]
Senior System Administrator
Office of Research / SCOREC, Rensselaer Polytechnic Institute

Reply via email to