Hi Carl, I honestly don't use smap to create bluegene.conf files for the Blue Genes I've set up as I find it's easier to just work with the skeleton provided and add to it as needed so I can't really comment on that (I also regularly update patches by modifying the diff hunks by hand so it's most likely some sort of personality flaw ;) ). But with regard to block sizes and the value of Numpsets (which in SLURM 2.4 you can use IONodesPerMP as per the bluegene.conf manpage):
Unless you enable sub-block allocations (section 6.3 in the Blue Gene/Q System Administration redbook), by adding AllowSubBlockAllocations=Yes to your bluegene.conf file, your block size will be limited by the ratio of compute nodes to IO nodes in a rack (or wired to a rack now given that IO nodes can be in their own separate IO racks on BG/Q). This isn't a SLURM limitation but rather just a by-product of the choices IBM made in creating the Blue Gene architecture (affecting L, P and Q). So, you have 16 IO nodes per rack, 8 per midplane of 512 compute nodes. This means the smallest block size your system can support is 64 compute nodes. And this is where things would have stopped if you were using a Blue Gene L or P. But on Q systems the underlying Blue Gene system software and SLURM support sub-block jobs where multiple jobs can share IO nodes. Using this feature you can ask for an allocation down to one node and be given one node out of a block to use. As for the layout mode in the bluegene.conf file I believe the documentation is overly cautious. We've been using dynamic mode on our Blue Gene/P for the last two years (just decommissioned) and we've been using dynamic mode on our Blue Gene/Q for the last month and a half. When you enable dynamic mode you do not need to define any blocks in the Block Layout section as blocks are created on-the-fly as needed. For the four-rack Blue Gene/Q that I look after (we only have one IO drawer on each rack so we're limited to block sizes of 128 nodes) we are using both dynamic layout mode and we're enabling sub-block allocations with good results. In this mode the smallest block SLURM creates is a midplane and it uses sub-block jobs (within a midplane block) for any job requesting a number of nodes that is smaller than a midplane. Below is our bluegene.conf file: MloaderImage=/bgsys/drivers/ppcfloor/boot/firmware BridgeAPILogFile=/var/log/slurm/bridgeapi.log BridgeAPIVerbose=2 # We have 4 IO nodes per midplane, 32 for the four-rack system IONodesPerMP=4 # Once any condes in a block are in the error state stop running # jobs in that block MaxBlockInError=0 MidplaneNodeCnt=512 NodeCardNodeCnt=32 AllowSubBlockAllocations=Yes LayoutMode=DYNAMIC Because IONodesPerMP (the new Numpsets) is a system specific setting smap just defaults to a value of 4 to get you going. It then relies on you editing the bluegene.conf afterwards and updating it to match your configuration. Admittedly, it would be nice if it asked but no-one has gotten around to adding something like that. So, I guess the first thing to try is enabling sub-block allocations in your existing bluegene.conf file to allow the smaller allocations you desire. You could also try using a dynamic layout mode too if you felt inclined. Let us know how you go. Thanks! Mark On 09/08/12 05:08, Carl Schmidtmann wrote: > > According to the documentation I am suppose to use 'smap' to create the > bluegene.conf file with all the block mappings. However the user interface > for the smap command is not very user friendly. But with the documentation > and some experimenting I did get it to generate a bluegene.conf file. Thanks > to a comment from someone on this list last week about changing > > MloaderImage=/bgsys/drivers/ppcfloor/boot/uloader > > to > > MloaderImage=/bgsys/drivers/ppcfloor/boot/firmware > > the bluegene.conf file works. I was also able to get things working with > "LayoutMode=overlap" but I had to hand edit the bluegene.conf file - > something the documentation says to never do. In 'smaps' once I specify that > the layout is overlap it makes all the blocks on midplane 000. I'm sure there > is some way to do this in smap but it was easier to just duplicate the lines > and change "MPs=0000" to "MPs=0001". > > The part I can't get working is to specify 32 node blocks. I have tried it in > 'smap' and by editing the bluegene.conf file. It appears to work when you > start slurm but then some of the blocks go into an error mode. So far I have > not been able to clear this error with anything short of a clean start of > slurm after deleting all the blocks. And even then the blocks will go right > back to the error state the first time slurm tries to use them. > > I believe the problem has to do the IO node maping and maybe the 'Numpsets' > parameter. The 'smap' program always sets 'Numpsets' to 4 but slurmctld won't > allocate a 64 node block unless 'Numpsets' is 8 and won't allocate a 32 node > block unless it is 16. I was worried that I might be breaking something by > changing it and in the case of 32 node blocks that appears to be true. For 64 > node blocks it appears to be working. > > We have a single rack BlueGeneQ with two IO servers on top. Each IO server > has 8 node cards in it. We were told this system would support configurations > with block sizes down to a single node but there doesn't appear to be any way > to set this up in slurm. The documentation for 'smap' talks about the L and P > but hasn't been updated for the Q yet. > > So is there someone out there that has successfully configured slurm to > handle small block sizes? Can you post a sample bluegene.conf file to show > how these are defined? > > I realize most people with a BlueGeneQ have multiple racks and probably don't > really want to support jobs that use less than a full midplane but with a > single rack we want to support a debug or interactive partition with only a > few nodes to let people get there programs working before turing them loose > on bigger block partitions. > > Thanks, > Carl >
