Perceus and Warewulf fit Dave's list very well. 1) Both are free 2) Both use stateless images by default, but instead of an NFS root, they use a RAM image. You can "hybridize" the image so that only some of the image resides in RAM and the rest comes via NFS. 3) You can create different images for different OSes, you can define groups of nodes, and apply images to specific nodes or groups. 4) Package management is handled within the VNFS image using chroot (or, you can use yum --installroot). Another nice feature is after applying updates or modifying config files, both systems allow you to push out the changes to a live node without needing to reboot, excepting of course kernel updates. Even running applications will restart.
Ian On Thu, Oct 3, 2013 at 8:52 AM, Tina Friedrich <[email protected]> wrote: > On 03/10/13 16:09, Gavin W. Burris wrote: >> >> On Thu, Oct 03, 2013 at 10:13:34AM -0400, Prentice Bisbal wrote: >>> >>> On 10/02/2013 07:00 PM, Dave Love wrote: >>>> >>>> Lionel SPINELLI <[email protected]> writes: >>>> >>>>> Hello all, >>>>> >>>>> I have a question that is not directly linked to SGE but relates to >>>>> the same. Which tool administrators that have to install, manage, >>>>> configure and ensure coherence between lot of grid nodes use? I mean, >>>>> if I have 10 nodes in my grid and need to be sure that all of them >>>>> have the right software/configuration, I don't want to manually >>>>> configure each machine. >>>> >>>> It seems to be a religious topic... My requirements for managing node >>>> images are: >>>> >>>> 1. free software >>>> 2. stateless image (NFS root + local /tmp; modify the shared root >>>> more-or-less directly) >>>> 3. support for heterogeneous systems with different images for multiple >>>> OSes and customizing for different node groups with a single image >>>> 4. decoupled from the OS (not living somewhat in its own world, like >>>> Rocks) so you do normal package management >>>> >>>> When I had to pick one swiftly, the only one it was clear would do >>>> 3. properly was oneSIS <http://www.onesis.org>, though probably others >>>> can. I've run a 250-node horrible mess of hardware as a shared >>>> everything cluster with oneSIS off a single NFS server. I recently >>>> replaced a vendor's useless imaging scheme with it for the second time. >>>> >>>>> Do you know a simple tool that could do the job? My researches lead me >>>>> to "Puppet Master" but I would like to get advises from experts... >>>> >>>> I'm not convinced that's appropriate for an HPC cluster, but people with >>>> more HPC experience disagree. >>>> >>>> You need tools apart from image management, of course. >>>> >>> >>> Dave's right, this is a religious topic. I see Dave's point of using >>> something not coupled to the OS. I, however, have always used RHEL >>> derivatives, so I've just used the combination of Kickstart with >>> DHCP and PXE booting, and it has served me well. With kickstart, >>> it's not to hard with some basic pre- and post-scripting to come up >>> with some different configuration options if you need different >>> images installed on different machines. >>> >>> For configuration management of cluster, something like puppet can >>> be overkill. In the past, I kept all my config files on a webserver >>> only accessible to the cluster nodes, and then used a post-install >>> script to wget all the config files needed. This was only about 10 - >>> 20 files, so a simple for-loop kept it manageable. If I ever needed >>> to update config files, I used a parallel-front end to ssh (there >>> are several good ones out there) to execute wget across all nodes >>> and restart any services as necessary. I've seen others accomplish >>> the same thing with rdist. >>> >>> Some of you might be rolling your eyes thinking this a lot of work, >>> bu it really isn't, and my clusters have been pretty static once >>> they're up and running, so it's not often I need to make any >>> configuration changes. >>> >>> I've used puppet in many other situations, and I'm going to start >>> using it on my clusters, too. These makes the kickstart post-install >>> script one line - a single call to puppet. >>> >>> If you decide to use puppet this way on your cluster, you don't want >>> to have the daemon running all the time. If you do, the daemon will >>> check in every 30 minutes, and slow down your jobs. It's best to run >>> puppet as a cron job with the --one-time flag (do not daemonize) >>> only once a day or so, or use a parallel front-end to ssh to run >>> puppet with --one-time only as needed. >>> >>> That's how I do it. >>> >>> Prentice >> >> >> Hi, Prentice. >> >> I'm totally with you on this approach. I have defaulted to DIY >> configuration management with ssh keys, rsync, a directory of config >> files, a text file of hostnames, and a for loop. Initial install is >> done with a RHEL/CentOS/SL kickstart file via DHCP, PXE and NFS. >> Software is installed via an RPM repository or a self-contained >> directory in an /opt directory. >> >> Putting a bunch of command lines in the kickstart %post section can be >> fragile with successive updates, though. I keep looking at >> chef/puppet/salt >> as a way to get down to as few lines as possible in the kickstart. Let >> us know how your puppet cluster install goes. >> >> Cheers. >> > > Oops. I didn't even consider PXE/kickstart 'OS dependent'. I would consider > a combination of PXE, kickstart (or whatever installation scripting system > you are using) and Puppet/Chef/CFEngine/... satisfy my 'OS independence' > requirement, really. > > I was more thinking of things like Rocks. > > > Tina > > -- > Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd > Diamond House, Harwell Science and Innovation Campus - 01235 77 8442 > > -- > This e-mail and any attachments may contain confidential, copyright and or > privileged material, and are for the use of the intended addressee only. If > you are not the intended addressee or an authorised recipient of the > addressee please notify us of receipt by returning the e-mail and do not > use, copy, retain, distribute or disclose the information in or attached to > the e-mail. > Any opinions expressed within this e-mail are those of the individual and > not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. > cannot guarantee that this e-mail or any attachments are free from viruses > and we cannot accept liability for any damage which you may sustain as a > result of software viruses which may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England > and Wales with its registered office at Diamond House, Harwell Science and > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
