Tim, could you up your debug to 3 or so on the BridgeAPIVerbose in the 
bluegene.conf and post the /var/log/slurm/bridgeapi.log that would be helpful, 
but this seems rather strange.

I am guessing it happens every time?  For what it is worth, the crash is in the 
IBM stuff so we can't do much about it.  It seems strange you get a bunch 
of these blocks made and then the last one doesn't get made with a bunch of 
errors about the block.

Have you tried doing a clean start?  Perhaps there is something wrong with the 
state load from 2.2 to 2.3?

Danny

On Tuesday August 30 2011 6:53:17 PM you wrote:
> Hey guys -
> 
> We were about to start testing out 2.3.0-rc2 on our 1-rack BG/L @ RPI, 
> but have not been able to launch slurmctld.
> 
> I've poked around and haven't found an obvious cause yet, although I can 
> see that the block creation code has been changed a decent amount 
> compared to 2.2 to make room for BG/Q.
> 
> The crash is:
> 
> slurmctld: Record: BlockID:RMP30Au165517092 Nodes:bp000[2] Conn:Small
> slurmctld: debug2: adding block
> slurmctld: debug2: done adding
> slurmctld: Record: BlockID:RMP30Au165517102 Nodes:bp000[1] Conn:Small
> slurmctld: debug2: adding block
> slurmctld: debug2: done adding
> slurmctld: Record: BlockID:RMP30Au165517115 Nodes:bp000[0] Conn:Small
> slurmctld: error: bridge_set_data(RM_PartitionBlrtsImg): Invalid input
> slurmctld: error: bridge_set_data(RM_PartitionLinuxImg): Invalid input
> slurmctld: error: bridge_set_data(RM_PartitionRamdiskImg): Invalid input
> slurmctld: error: bridge_set_data(RM_PartitionMloaderImg): Invalid input
> slurmctld: error: Requesting small block with 0 mps, needs to be 1.
> slurmctld: fatal: Error, could not create the static blocks
> 
> -CLI INVALID HANDLE-----
>    cliRC = -2
>    line  = 242
>    file  = TxObject.cc
> slurmctld: error: bridge_get_block_info(RMP08Fe113120123): Internal error
> Segmentation fault
> 
> 
> The full debug output is
> http://scorec.rpi.edu/~wickbt/slurmctld-crash-2.3.0b2
> 
> Our slurm.conf is http://scorec.rpi.edu/~wickbt/slurm.conf
> Our bluegene.conf is http://scorec.rpi.edu/~wickbt/bluegene.conf
> 
> As an added challenge, it does *not* crash under the BG/L emulation 
> mode... I suspect this narrows it down to some potential mishandling of 
> the bg_record struct before the call in to _pre_allocate() ?
> 
> Any ideas?
> 
> thanks,
> - Tim
> 
> --
> Tim Wickberg
> [email protected]
> Senior System Administrator
> Office of Research / SCOREC, Rensselaer Polytechnic Institute
> 

Reply via email to