We have been running our cluster using stateless images for over 6 years
now. For the most part, things are running great. There are two reasons for
our decision to run stateless:
1. our compute nodes originally did not have local hard drives
2. we envisioned a dynamic environment in which we would boot nodes
frequently with different images to satisfy different research needs

Today both of those points are invalid / do not apply. All of our compute
nodes come with hard drives, and we have never really booted cluster with
any images other than our "production" image. In addition, downtimes are
really hard to come by in our environment, and we treat our cluster as
production system.

So, my question is, does it make sense to continue with stateless images,
or would we be better served with statefull (installed on local disk)

I question our today's method because:
1. stateless images are not trivial to build and update using genimage,
putting mellanox drivers, gpfs etc. We don't do it often enough so every
time we have to do it, we are re-inventing a wheel.
2. stateless images take up portion of compute node memory

Are there any downsides to running a 700+ node cluster using statefull
images? Like I said, we don't boot the cluster at all for many months at
the time (we get a single downtime during the year), and most of the
packages outside of normal RH installation are installed using postscripts.

Let me know your thoughts.

Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
xCAT-user mailing list

Reply via email to