[slurm-dev] Re: SLURM on Blue Gene/Q: sub-node jobs?

Mark Nelson Tue, 03 Jul 2012 01:24:07 -0700

Hi Danny,

Thanks for the quick reply!


On 03/07/12 03:44, Danny Auble wrote:
> Mark, the quick answer is no. We only allow 1 job per c-node. From my
> understanding the standard IBM driver does not allow this.

The driver that we're using at the moment, V1R1M1, seems to support 
sub-node jobs using the --corner option (I believe it was available in 
the V1R1M0 we had also) - it seems you can specify which core a job 
should run on via the "core corner" location such as R00-M1-N12-J05-C00. 
 From the runjob manpage:

    --corner
        Location string of a single node *or* single core within the 
block to
        represent the corner of a sub-block job. If the system administrator
        has configured a job scheduler, this parameter may be changed before
        the job starts.

        Example:

        ï¿½ R00-M0-N00-J00 (node J00 on board N00 in midplane R00-M0)

        ï¿½ R00-M1-N12-J05-C00 (core 0 on node J05 on board N12 in midplane
          R00-M1)

        Core corner locations cannot use a shape or alternate ranks per 
node,
        therefore --shape and --ranks-per-node are both ignored when 
specified
        with a core corner location. Node corner locations require a --shape
        argument. Multiple sub-node jobs on a single node using core corner
        locations are limited to a single user.


I believe you should be able to have sixteen single-core single-threaded 
jobs running on the same node under one allocation (all by the same user 
obviously).

>
> But while I am unaware of an IBM driver that does allows this, perhaps
> you have a special driver that makes it possible, hence the question. So
> a more drawn out explanation is... When we designed the interface it was
> not possible, and if it is possible I don't think it is in our road map.
> Quite a bit of change would need to happen to support such a thing. My
> current understanding of the IBM interface leads me to think it
> impossible to gather the information needed to layout jobs in this
> fashion. But as you are aware things change. If there is a way to tell
> runjob how to do this perhaps that information could be gathered in the
> srun runjob-opts variable and overcommit the resources. But this is just
> a guess.

I'll give it a go passing runjob-opts="--corner=<some core within the 
node I get allocated>" to srun and see what I get.

Thanks!
Mark

[slurm-dev] Re: SLURM on Blue Gene/Q: sub-node jobs?

Reply via email to