According to the documentation I am suppose to use 'smap' to create the 
bluegene.conf file with all the block mappings. However the user interface for 
the smap command is not very user friendly. But with the documentation and some 
experimenting I did get it to generate a bluegene.conf file. Thanks to a 
comment from someone on this list last week about changing

MloaderImage=/bgsys/drivers/ppcfloor/boot/uloader

to

MloaderImage=/bgsys/drivers/ppcfloor/boot/firmware

the bluegene.conf file works. I was also able to get things working with 
"LayoutMode=overlap" but I had to hand edit the bluegene.conf file - something 
the documentation says to never do. In 'smaps' once I specify that the layout 
is overlap it makes all the blocks on midplane 000. I'm sure there is some way 
to do this in smap but it was easier to just duplicate the lines and change 
"MPs=0000" to "MPs=0001".

The part I can't get working is to specify 32 node blocks. I have tried it in 
'smap' and by editing the bluegene.conf file. It appears to work when you start 
slurm but then some of the blocks go into an error mode. So far I have not been 
able to clear this error with anything short of a clean start of slurm after 
deleting all the blocks. And even then the blocks will go right back to the 
error state the first time slurm tries to use them.

I believe the problem has to do the IO node maping and maybe the 'Numpsets' 
parameter. The 'smap' program always sets 'Numpsets' to 4 but slurmctld won't 
allocate a 64 node block unless 'Numpsets' is 8 and won't allocate a 32 node 
block unless it is 16. I was worried that I might be breaking something by 
changing it and in the case of 32 node blocks that appears to be true. For 64 
node blocks it appears to be working.

We have a single rack BlueGeneQ with two IO servers on top. Each IO server has 
8 node cards in it. We were told this system would support configurations with 
block sizes down to a single node but there doesn't appear to be any way to set 
this up in slurm. The documentation for 'smap' talks about the L and P but 
hasn't been updated for the Q yet.

So is there someone out there that has successfully configured slurm to handle 
small block sizes? Can you post a sample bluegene.conf file to show how these 
are defined?

I realize most people with a BlueGeneQ have multiple racks and probably don't 
really want to support jobs that use less than a full midplane but with a 
single rack we want to support a debug or interactive partition with only a few 
nodes to let people get there programs working before turing them loose on 
bigger block partitions. 

Thanks,
Carl

-- 
Carl Schmidtmann 
Center for Integrated Research Computing 
University of Rochester 

Reply via email to