Re: [gridengine users] How to manage grid nodes

Prentice Bisbal Thu, 03 Oct 2013 07:21:17 -0700

On 10/02/2013 07:00 PM, Dave Love wrote:

Lionel SPINELLI <[email protected]> writes:

Hello all,

I have a question that is not directly linked to SGE but relates to
the same. Which tool administrators that have to install, manage,
configure and ensure coherence between lot of grid nodes use? I mean,
if I have 10 nodes in my grid and need to be sure that all of them
have the right software/configuration, I don't want to manually
configure each machine.

It seems to be a religious topic...  My requirements for managing node
images are:

1. free software
2. stateless image (NFS root + local /tmp; modify the shared root
    more-or-less directly)
3. support for heterogeneous systems with different images for multiple
    OSes and customizing for different node groups with a single image
4. decoupled from the OS (not living somewhat in its own world, like
    Rocks) so you do normal package management

When I had to pick one swiftly, the only one it was clear would do
3. properly was oneSIS <http://www.onesis.org>, though probably others
can.  I've run a 250-node horrible mess of hardware as a shared
everything cluster with oneSIS off a single NFS server.  I recently
replaced a vendor's useless imaging scheme with it for the second time.

Do you know a simple tool that could do the job? My researches lead me
to "Puppet Master" but I would like to get advises from experts...

I'm not convinced that's appropriate for an HPC cluster, but people with
more HPC experience disagree.

You need tools apart from image management, of course.

Dave's right, this is a religious topic. I see Dave's point of usingsomething not coupled to the OS. I, however, have always used RHELderivatives, so I've just used the combination of Kickstart with DHCPand PXE booting, and it has served me well. With kickstart, it's not tohard with some basic pre- and post-scripting to come up with somedifferent configuration options if you need different images installedon different machines.

For configuration management of cluster, something like puppet can beoverkill. In the past, I kept all my config files on a webserver onlyaccessible to the cluster nodes, and then used a post-install script towget all the config files needed. This was only about 10 - 20 files, soa simple for-loop kept it manageable. If I ever needed to update configfiles, I used a parallel-front end to ssh (there are several good onesout there) to execute wget across all nodes and restart any services asnecessary. I've seen others accomplish the same thing with rdist.

Some of you might be rolling your eyes thinking this a lot of work, buit really isn't, and my clusters have been pretty static once they're upand running, so it's not often I need to make any configuration changes.

I've used puppet in many other situations, and I'm going to start usingit on my clusters, too. These makes the kickstart post-install scriptone line - a single call to puppet.

If you decide to use puppet this way on your cluster, you don't want tohave the daemon running all the time. If you do, the daemon will checkin every 30 minutes, and slow down your jobs. It's best to run puppet asa cron job with the --one-time flag (do not daemonize) only once a dayor so, or use a parallel front-end to ssh to run puppet with --one-timeonly as needed.


That's how I do it.

Prentice

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] How to manage grid nodes

Reply via email to