[gridengine users] A Virtual GridEngine Cluster in a cluster

Reuti Fri, 08 Mar 2019 02:42:54 -0800

While the original idea was to use some workflow for low core count jobs in a 
SLURM cluster, it ended up with a setup of a Virtual Cluster in (possibly) any 
queuing system. Although it might depend on the site policy to allow and use 
such a set up, it's at least a working scenario and might add features to any 
actual installation, which are not available or not set up. On the other hand 
this provides some kind of micro-scheduling inside the given allocation which 
is not available otherwise.


We got access to a SLURM equipped cluster where one always get complete nodes 
and are asked to avoid single serial jobs or to pack them by scripting to fill 
the nodes. With the additional need for a workflow application (kinda DRMAA) 
and array job dependencies, I got the idea to run a GridEngine instance as a 
Virtual Cluster in a SLURM cluster to solve this.

Essentially it's quite easy, as GridEngine offers:

- one can start SGE as normal user (for a single user setup per Virtual Cluster 
it's exactly appropriate)
- SGE supports independent configurations, i.e. each Virtual Cluster is an 
SGE_CELL
- configuration files can be plain text files (classic), and hence are easily 
adjustable

After an untar of SGE somewhere like 
/YOUR/PATH/HERE/VC-common-installation/opt/sge (no need to install anything 
here), we need a planchet of a "classic" configuration put there named 
"__SGE_PLANCHET__", and like the /tmp directory everyone should be able to 
write at this level besides the "__SGE_PLANCHET__" (`chmod go=rwx,+t 
/YOUR/PATH/HERE/VC-common-installation/opt/sge`). To the planchet you can add 
items as needed, e.g. more PEs, complexes, queues,…

The enclosed script `multi-spawn.sh` gives an idea what has to be done then to 
start a virtual cluster, even several ones per user, i.e.:

$ sbatch multi-spawn.sh

Regarding DRMAA one doesn't need to run this on the login node or a dedicated 
job, instead the workflow application is already part of the (SLURM) job itself 
(to be put in the application section in `multi-spawn.sh`).

===

While the planchet was created still with 6.2u5, there are only a few steps 
necessary to create one for your version of SGE:

Run each install_* for qmaster and execd once. Essentially this will create 
only a configuration and choose "classic" for the spooling method (no need to 
add any exechost when you are asked for, in fact: remove the one which was 
added afterwards, and in the @allhosts hostgroup too). Then rename this created 
"default" configuration to "__SGE_PLANCHET__" and look in my planchet with 
`grep` for entries like __FOO__ (i.e. strings enclosed by a double underscore). 
These have to be replaced then there accordingly. The `multi-spawn.sh` will 
then change these in a copy of the planchet to the names and location of the 
actual SGE instance; i.e. each SGE_CELL has also its own spool directory.

Notably it's in sgemaster and sgeexecd:

SGE_ROOT=/usr/sge; export SGE_ROOT
SGE_CELL=default; export SGE_CELL

to:

SGE_ROOT=__SGE_INSTALLATION__; export SGE_ROOT
SGE_CELL=__SGE_CELL__; export SGE_CELL

===

You might need passphraseless `ssh` between the nodes, unless you start remote 
daemons by `srun`. If this is not working too, a pseudo MPI application whose 
only duty is to start the sgeexecd on each involved node should do.

===

In case you want to login to one of the nodes which were granted for your 
Virtual Cluster interactively, you need to:

$ source 
/YOUR/PATH/HERE/VC-common-installation/opt/sge/SGE_<SLURM_JOB_ID>/common/settings.sh

there to gain access to the SGE commands in the interactive shell for this 
particular Virtual Cluster. Therefore two mini functions `sge-set 
<SLURM_JOB_ID>` and `sge-done` are included to ease this.

While this works on the nodes instantly, it's necessary to add the head node(s) 
of the SLURM cluster in the planchet beforehand as submit and/or admin hosts.

===

In case one wants to send emails, note that the default for GridEngine is the 
account of the login node, which is in this case an exechost for SLURM. Either 
a special set up there is necessary to receive email on an exechost, or provide 
always an absolute eMail address with the option "-M" to GridEngine.

===

As every VC starts with job id 1, it might be helpful to create scratch 
directories (in a global prolog/epilog) consisting of 
"${SLURM_JOB_ID}_$(basename ${TMPDIR})". If you are getting always full nodes, 
you won't have this problem on a local scratch directory for $TMPDIR though.

===

BTW: did I mention it: no need to be root anywhere.

-- Reuti

multi-spawn.sh
Description: Binary data

__SGE_PLANCHET__.tgz
Description: Binary data

cluster.tgz
Description: Binary data

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

[gridengine users] A Virtual GridEngine Cluster in a cluster

Reply via email to