If you don't have a requirement to run multiple jobs on a given node (not
doing so saves a lot of headache in general) then you could also run
stateless Linux images that get rebooted after each jobs terminates. Since
the image is typically running off of an NFS mount boot times are usually
very quick. AFAIK this is what SGI does on their Altix ICE platform. You
could use something like xCAT to make this a little easier.

2011/6/15 Bjørn-Helge Mevik <[email protected]>

> Dear all,
>
> We are in thre process of designing a cluster to be used for
> calculations on sensitive data (DNA from patients, etc.).  We would like
> to be able to run jobs from different projects at the same time, and
> naturally, the jobs should be shielded from each other.
>
> One idea we are investigating is to use virtual machines running on the
> cluster, and then roll back/restart the VMs between each job.  In
> particular, we consider setting up a fixed number of VMs, matching the
> physical hardware of the cluster, and tell slurm to use those VMs as its
> compute nodes.  We will only allocate one job to each VM, and have an
> EpilogSlurmctld script that rolls back the VM after the job finishes.
> (We might also have a PrologSlurmctld script that rolls back the VM
> before a job starts, for extra security.  Alternatively, the epilog will
> shut down the VM and the prolog will boot it from scratch.)
>
> Does this sound like a good way to isolate jobs from each other?
>
> Has anyone here done anything like this, or have ideas/thoughs about how
> best to isolate jobs from each other?
>
>
> --
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Research Computing Services, University of Oslo
>
>

Reply via email to