On 03/10/13 16:09, Gavin W. Burris wrote:
On Thu, Oct 03, 2013 at 10:13:34AM -0400, Prentice Bisbal wrote:
On 10/02/2013 07:00 PM, Dave Love wrote:
Lionel SPINELLI <[email protected]> writes:

Hello all,

I have a question that is not directly linked to SGE but relates to
the same. Which tool administrators that have to install, manage,
configure and ensure coherence between lot of grid nodes use? I mean,
if I have 10 nodes in my grid and need to be sure that all of them
have the right software/configuration, I don't want to manually
configure each machine.
It seems to be a religious topic...  My requirements for managing node
images are:

1. free software
2. stateless image (NFS root + local /tmp; modify the shared root
    more-or-less directly)
3. support for heterogeneous systems with different images for multiple
    OSes and customizing for different node groups with a single image
4. decoupled from the OS (not living somewhat in its own world, like
    Rocks) so you do normal package management

When I had to pick one swiftly, the only one it was clear would do
3. properly was oneSIS <http://www.onesis.org>, though probably others
can.  I've run a 250-node horrible mess of hardware as a shared
everything cluster with oneSIS off a single NFS server.  I recently
replaced a vendor's useless imaging scheme with it for the second time.

Do you know a simple tool that could do the job? My researches lead me
to "Puppet Master" but I would like to get advises from experts...
I'm not convinced that's appropriate for an HPC cluster, but people with
more HPC experience disagree.

You need tools apart from image management, of course.


Dave's right, this is a religious topic. I see Dave's point of using
something not coupled to the OS. I, however, have always used RHEL
derivatives, so I've just used the combination of Kickstart with
DHCP and PXE booting, and it has served me well. With kickstart,
it's not to hard with some basic pre- and post-scripting to come up
with some different configuration options if you need different
images installed on different machines.

For configuration management of cluster, something like puppet can
be overkill. In the past, I kept all my config files on a webserver
only accessible to the cluster nodes, and then used a post-install
script to wget all the config files needed. This was only about 10 -
20 files, so a simple for-loop kept it manageable. If I ever needed
to update config files, I used a parallel-front end to ssh (there
are several good ones out there) to execute wget across all nodes
and restart any services as necessary. I've seen others accomplish
the same thing with rdist.

Some of you might be rolling your eyes thinking this a lot of work,
bu it really isn't, and my clusters have been pretty static once
they're up and running, so it's not often I need to make any
configuration changes.

I've used puppet in many other situations, and I'm going to start
using it on my clusters, too. These makes the kickstart post-install
script one line  - a single call to puppet.

If you decide to use puppet this way on your cluster, you don't want
to have the daemon running all the time. If you do, the daemon will
check in every 30 minutes, and slow down your jobs. It's best to run
puppet as a cron job with the --one-time flag (do not daemonize)
only once a day or so, or use a parallel front-end to ssh to run
puppet with --one-time only as needed.

That's how I do it.

Prentice

Hi, Prentice.

I'm totally with you on this approach.  I have defaulted to DIY
configuration management with ssh keys, rsync, a directory of config
files, a text file of hostnames, and a for loop.  Initial install is
done with a RHEL/CentOS/SL kickstart file via DHCP, PXE and NFS.
Software is installed via an RPM repository or a self-contained
directory in an /opt directory.

Putting a bunch of command lines in the kickstart %post section can be
fragile with successive updates, though.  I keep looking at chef/puppet/salt
as a way to get down to as few lines as possible in the kickstart.  Let
us know how your puppet cluster install goes.

Cheers.


Oops. I didn't even consider PXE/kickstart 'OS dependent'. I would consider a combination of PXE, kickstart (or whatever installation scripting system you are using) and Puppet/Chef/CFEngine/... satisfy my 'OS independence' requirement, really.

I was more thinking of things like Rocks.

Tina

--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442

--
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom




_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to