Tim, could you up your debug to 3 or so on the BridgeAPIVerbose in the bluegene.conf and post the /var/log/slurm/bridgeapi.log that would be helpful, but this seems rather strange.
I am guessing it happens every time? For what it is worth, the crash is in the IBM stuff so we can't do much about it. It seems strange you get a bunch of these blocks made and then the last one doesn't get made with a bunch of errors about the block. Have you tried doing a clean start? Perhaps there is something wrong with the state load from 2.2 to 2.3? Danny On Tuesday August 30 2011 6:53:17 PM you wrote: > Hey guys - > > We were about to start testing out 2.3.0-rc2 on our 1-rack BG/L @ RPI, > but have not been able to launch slurmctld. > > I've poked around and haven't found an obvious cause yet, although I can > see that the block creation code has been changed a decent amount > compared to 2.2 to make room for BG/Q. > > The crash is: > > slurmctld: Record: BlockID:RMP30Au165517092 Nodes:bp000[2] Conn:Small > slurmctld: debug2: adding block > slurmctld: debug2: done adding > slurmctld: Record: BlockID:RMP30Au165517102 Nodes:bp000[1] Conn:Small > slurmctld: debug2: adding block > slurmctld: debug2: done adding > slurmctld: Record: BlockID:RMP30Au165517115 Nodes:bp000[0] Conn:Small > slurmctld: error: bridge_set_data(RM_PartitionBlrtsImg): Invalid input > slurmctld: error: bridge_set_data(RM_PartitionLinuxImg): Invalid input > slurmctld: error: bridge_set_data(RM_PartitionRamdiskImg): Invalid input > slurmctld: error: bridge_set_data(RM_PartitionMloaderImg): Invalid input > slurmctld: error: Requesting small block with 0 mps, needs to be 1. > slurmctld: fatal: Error, could not create the static blocks > > -CLI INVALID HANDLE----- > cliRC = -2 > line = 242 > file = TxObject.cc > slurmctld: error: bridge_get_block_info(RMP08Fe113120123): Internal error > Segmentation fault > > > The full debug output is > http://scorec.rpi.edu/~wickbt/slurmctld-crash-2.3.0b2 > > Our slurm.conf is http://scorec.rpi.edu/~wickbt/slurm.conf > Our bluegene.conf is http://scorec.rpi.edu/~wickbt/bluegene.conf > > As an added challenge, it does *not* crash under the BG/L emulation > mode... I suspect this narrows it down to some potential mishandling of > the bg_record struct before the call in to _pre_allocate() ? > > Any ideas? > > thanks, > - Tim > > -- > Tim Wickberg > [email protected] > Senior System Administrator > Office of Research / SCOREC, Rensselaer Polytechnic Institute >
