Re: [gridengine users] How to manage grid nodes

Gavin W. Burris Thu, 03 Oct 2013 08:21:39 -0700

On Thu, Oct 03, 2013 at 10:13:34AM -0400, Prentice Bisbal wrote:
> On 10/02/2013 07:00 PM, Dave Love wrote:
> >Lionel SPINELLI <[email protected]> writes:
> >
> >>Hello all,
> >>
> >>I have a question that is not directly linked to SGE but relates to
> >>the same. Which tool administrators that have to install, manage,
> >>configure and ensure coherence between lot of grid nodes use? I mean,
> >>if I have 10 nodes in my grid and need to be sure that all of them
> >>have the right software/configuration, I don't want to manually
> >>configure each machine.
> >It seems to be a religious topic...  My requirements for managing node
> >images are:
> >
> >1. free software
> >2. stateless image (NFS root + local /tmp; modify the shared root
> >    more-or-less directly)
> >3. support for heterogeneous systems with different images for multiple
> >    OSes and customizing for different node groups with a single image
> >4. decoupled from the OS (not living somewhat in its own world, like
> >    Rocks) so you do normal package management
> >
> >When I had to pick one swiftly, the only one it was clear would do
> >3. properly was oneSIS <http://www.onesis.org>, though probably others
> >can.  I've run a 250-node horrible mess of hardware as a shared
> >everything cluster with oneSIS off a single NFS server.  I recently
> >replaced a vendor's useless imaging scheme with it for the second time.
> >
> >>Do you know a simple tool that could do the job? My researches lead me
> >>to "Puppet Master" but I would like to get advises from experts...
> >I'm not convinced that's appropriate for an HPC cluster, but people with
> >more HPC experience disagree.
> >
> >You need tools apart from image management, of course.
> >
> 
> Dave's right, this is a religious topic. I see Dave's point of using
> something not coupled to the OS. I, however, have always used RHEL
> derivatives, so I've just used the combination of Kickstart with
> DHCP and PXE booting, and it has served me well. With kickstart,
> it's not to hard with some basic pre- and post-scripting to come up
> with some different configuration options if you need different
> images installed on different machines.
> 
> For configuration management of cluster, something like puppet can
> be overkill. In the past, I kept all my config files on a webserver
> only accessible to the cluster nodes, and then used a post-install
> script to wget all the config files needed. This was only about 10 -
> 20 files, so a simple for-loop kept it manageable. If I ever needed
> to update config files, I used a parallel-front end to ssh (there
> are several good ones out there) to execute wget across all nodes
> and restart any services as necessary. I've seen others accomplish
> the same thing with rdist.
> 
> Some of you might be rolling your eyes thinking this a lot of work,
> bu it really isn't, and my clusters have been pretty static once
> they're up and running, so it's not often I need to make any
> configuration changes.
> 
> I've used puppet in many other situations, and I'm going to start
> using it on my clusters, too. These makes the kickstart post-install
> script one line  - a single call to puppet.
> 
> If you decide to use puppet this way on your cluster, you don't want
> to have the daemon running all the time. If you do, the daemon will
> check in every 30 minutes, and slow down your jobs. It's best to run
> puppet as a cron job with the --one-time flag (do not daemonize)
> only once a day or so, or use a parallel front-end to ssh to run
> puppet with --one-time only as needed.
> 
> That's how I do it.
> 
> Prentice


Hi, Prentice.

I'm totally with you on this approach.  I have defaulted to DIY
configuration management with ssh keys, rsync, a directory of config
files, a text file of hostnames, and a for loop.  Initial install is
done with a RHEL/CentOS/SL kickstart file via DHCP, PXE and NFS.
Software is installed via an RPM repository or a self-contained
directory in an /opt directory.

Putting a bunch of command lines in the kickstart %post section can be
fragile with successive updates, though.  I keep looking at chef/puppet/salt
as a way to get down to as few lines as possible in the kickstart.  Let
us know how your puppet cluster install goes.

Cheers.
-- 
Gavin W. Burris
Senior IT Project Leader
Research Computing
Wharton Computing
University of Pennsylvania
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] How to manage grid nodes

Reply via email to