On Thu, Oct 03, 2013 at 10:13:34AM -0400, Prentice Bisbal wrote: > On 10/02/2013 07:00 PM, Dave Love wrote: > >Lionel SPINELLI <[email protected]> writes: > > > >>Hello all, > >> > >>I have a question that is not directly linked to SGE but relates to > >>the same. Which tool administrators that have to install, manage, > >>configure and ensure coherence between lot of grid nodes use? I mean, > >>if I have 10 nodes in my grid and need to be sure that all of them > >>have the right software/configuration, I don't want to manually > >>configure each machine. > >It seems to be a religious topic... My requirements for managing node > >images are: > > > >1. free software > >2. stateless image (NFS root + local /tmp; modify the shared root > > more-or-less directly) > >3. support for heterogeneous systems with different images for multiple > > OSes and customizing for different node groups with a single image > >4. decoupled from the OS (not living somewhat in its own world, like > > Rocks) so you do normal package management > > > >When I had to pick one swiftly, the only one it was clear would do > >3. properly was oneSIS <http://www.onesis.org>, though probably others > >can. I've run a 250-node horrible mess of hardware as a shared > >everything cluster with oneSIS off a single NFS server. I recently > >replaced a vendor's useless imaging scheme with it for the second time. > > > >>Do you know a simple tool that could do the job? My researches lead me > >>to "Puppet Master" but I would like to get advises from experts... > >I'm not convinced that's appropriate for an HPC cluster, but people with > >more HPC experience disagree. > > > >You need tools apart from image management, of course. > > > > Dave's right, this is a religious topic. I see Dave's point of using > something not coupled to the OS. I, however, have always used RHEL > derivatives, so I've just used the combination of Kickstart with > DHCP and PXE booting, and it has served me well. With kickstart, > it's not to hard with some basic pre- and post-scripting to come up > with some different configuration options if you need different > images installed on different machines. > > For configuration management of cluster, something like puppet can > be overkill. In the past, I kept all my config files on a webserver > only accessible to the cluster nodes, and then used a post-install > script to wget all the config files needed. This was only about 10 - > 20 files, so a simple for-loop kept it manageable. If I ever needed > to update config files, I used a parallel-front end to ssh (there > are several good ones out there) to execute wget across all nodes > and restart any services as necessary. I've seen others accomplish > the same thing with rdist. > > Some of you might be rolling your eyes thinking this a lot of work, > bu it really isn't, and my clusters have been pretty static once > they're up and running, so it's not often I need to make any > configuration changes. > > I've used puppet in many other situations, and I'm going to start > using it on my clusters, too. These makes the kickstart post-install > script one line - a single call to puppet. > > If you decide to use puppet this way on your cluster, you don't want > to have the daemon running all the time. If you do, the daemon will > check in every 30 minutes, and slow down your jobs. It's best to run > puppet as a cron job with the --one-time flag (do not daemonize) > only once a day or so, or use a parallel front-end to ssh to run > puppet with --one-time only as needed. > > That's how I do it. > > Prentice
Hi, Prentice. I'm totally with you on this approach. I have defaulted to DIY configuration management with ssh keys, rsync, a directory of config files, a text file of hostnames, and a for loop. Initial install is done with a RHEL/CentOS/SL kickstart file via DHCP, PXE and NFS. Software is installed via an RPM repository or a self-contained directory in an /opt directory. Putting a bunch of command lines in the kickstart %post section can be fragile with successive updates, though. I keep looking at chef/puppet/salt as a way to get down to as few lines as possible in the kickstart. Let us know how your puppet cluster install goes. Cheers. -- Gavin W. Burris Senior IT Project Leader Research Computing Wharton Computing University of Pennsylvania _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
