Hi Jharrod,

Thanks very much for the heads-up about StarCluster, and for your work on
the SLURM fork.  It'll be a while before I get a chance to try it out, but
it's definitely of interest.

All the best,
Lyn

On Mon, May 14, 2012 at 11:10 AM, Jharrod W. LaFon
<[email protected]>wrote:

>  Hello,
>
> You may or may not be aware of a free utility called StarCluster (
> http://web.mit.edu/star/cluster/docs/latest/index.html), which completely
> allocates and configures clusters on Amazon's EC2.  It also provides the
> ability to grow or shrink the cluster using a load balancer.
>
> The default scheduler installed by StarCluster is SGE.  I have nothing
> against SGE, but I used SLURM when I worked at LANL and was very pleased
> with it, and decided to add it to StarCluster.
>
> The SLURM enabled fork of StarCluster is at
> https://github.com/jlafon/StarCluster, with a short set of instructions
> at
> https://github.com/jlafon/StarCluster/wiki/Getting-started-with-SLURM-on-Amazon's-EC2
> .
> This allows a fully configured SLURM cluster to be up and running in
> minutes.
>
> I do have some questions:
>
> Are there plans to add XML output to the SLURM utilities?  Right now I am
> parsing command output.  It would be much easier to implement a load
> balancer if this feature was available.
>
> I have enabled slurmdbd.  Should I query the database directly for running
> and completed job information, or only use sreport, sacct, etc?
>
> How can I queue jobs that I submit (rather than rejecting them) when the
> configured resources are enough to fulfill the job requirements?  When
> using a load balancer, it is desirable to only run the SLURM controller
> node (with no compute nodes) if there
> are no jobs running, and let the load balancer expand the cluster by
> adding compute nodes as jobs get submitted.  I have this feature working,
> but through a work around.  I configure a hidden partition of nodes with
> dummy entries in /etc/hosts, and update /etc/hosts with correct entries
> when compute nodes are added.  This allows a job to be queued rather than
> rejected, and allows slurmctld to start with fake nodes in a hidden
> partition (I noticed that slurmctld won't start at all if it can't resolve
> node hostnames to ip addresses).
>
> Thanks!
>
> --
> --
> Jharrod LaFon
>
>
>

Reply via email to