According to the documentation I am suppose to use 'smap' to create the bluegene.conf file with all the block mappings. However the user interface for the smap command is not very user friendly. But with the documentation and some experimenting I did get it to generate a bluegene.conf file. Thanks to a comment from someone on this list last week about changing
MloaderImage=/bgsys/drivers/ppcfloor/boot/uloader to MloaderImage=/bgsys/drivers/ppcfloor/boot/firmware the bluegene.conf file works. I was also able to get things working with "LayoutMode=overlap" but I had to hand edit the bluegene.conf file - something the documentation says to never do. In 'smaps' once I specify that the layout is overlap it makes all the blocks on midplane 000. I'm sure there is some way to do this in smap but it was easier to just duplicate the lines and change "MPs=0000" to "MPs=0001". The part I can't get working is to specify 32 node blocks. I have tried it in 'smap' and by editing the bluegene.conf file. It appears to work when you start slurm but then some of the blocks go into an error mode. So far I have not been able to clear this error with anything short of a clean start of slurm after deleting all the blocks. And even then the blocks will go right back to the error state the first time slurm tries to use them. I believe the problem has to do the IO node maping and maybe the 'Numpsets' parameter. The 'smap' program always sets 'Numpsets' to 4 but slurmctld won't allocate a 64 node block unless 'Numpsets' is 8 and won't allocate a 32 node block unless it is 16. I was worried that I might be breaking something by changing it and in the case of 32 node blocks that appears to be true. For 64 node blocks it appears to be working. We have a single rack BlueGeneQ with two IO servers on top. Each IO server has 8 node cards in it. We were told this system would support configurations with block sizes down to a single node but there doesn't appear to be any way to set this up in slurm. The documentation for 'smap' talks about the L and P but hasn't been updated for the Q yet. So is there someone out there that has successfully configured slurm to handle small block sizes? Can you post a sample bluegene.conf file to show how these are defined? I realize most people with a BlueGeneQ have multiple racks and probably don't really want to support jobs that use less than a full midplane but with a single rack we want to support a debug or interactive partition with only a few nodes to let people get there programs working before turing them loose on bigger block partitions. Thanks, Carl -- Carl Schmidtmann Center for Integrated Research Computing University of Rochester
