Hi Jharrod, Thanks very much for the heads-up about StarCluster, and for your work on the SLURM fork. It'll be a while before I get a chance to try it out, but it's definitely of interest.
All the best, Lyn On Mon, May 14, 2012 at 11:10 AM, Jharrod W. LaFon <[email protected]>wrote: > Hello, > > You may or may not be aware of a free utility called StarCluster ( > http://web.mit.edu/star/cluster/docs/latest/index.html), which completely > allocates and configures clusters on Amazon's EC2. It also provides the > ability to grow or shrink the cluster using a load balancer. > > The default scheduler installed by StarCluster is SGE. I have nothing > against SGE, but I used SLURM when I worked at LANL and was very pleased > with it, and decided to add it to StarCluster. > > The SLURM enabled fork of StarCluster is at > https://github.com/jlafon/StarCluster, with a short set of instructions > at > https://github.com/jlafon/StarCluster/wiki/Getting-started-with-SLURM-on-Amazon's-EC2 > . > This allows a fully configured SLURM cluster to be up and running in > minutes. > > I do have some questions: > > Are there plans to add XML output to the SLURM utilities? Right now I am > parsing command output. It would be much easier to implement a load > balancer if this feature was available. > > I have enabled slurmdbd. Should I query the database directly for running > and completed job information, or only use sreport, sacct, etc? > > How can I queue jobs that I submit (rather than rejecting them) when the > configured resources are enough to fulfill the job requirements? When > using a load balancer, it is desirable to only run the SLURM controller > node (with no compute nodes) if there > are no jobs running, and let the load balancer expand the cluster by > adding compute nodes as jobs get submitted. I have this feature working, > but through a work around. I configure a hidden partition of nodes with > dummy entries in /etc/hosts, and update /etc/hosts with correct entries > when compute nodes are added. This allows a job to be queued rather than > rejected, and allows slurmctld to start with fake nodes in a hidden > partition (I noticed that slurmctld won't start at all if it can't resolve > node hostnames to ip addresses). > > Thanks! > > -- > -- > Jharrod LaFon > > >
