I think stateless makes a little less sense over time.
1) Local boot storage is cheaper and more durable than it used to be, and
this is only going to get more extreme
2) Dynamism is probably better and more easily served by somethig like
Singularity, which makes things easier for users to do their thing without the
administrators having to accommodate.
3) Mitigating drift can be done in other ways. Stateless has
traditionally had the side effect of mitigating accumulating ‘drift’ as people
do things ad-hoc to OS images, by punishing those practices. Strictly speaking
the same discipline can be self-imposed without downside, it just takes some
willpower.
From: Damir Krstic [mailto:damir.krs...@gmail.com]
Sent: Friday, January 13, 2017 9:20 AM
To: xCAT Users Mailing list
Subject: [xcat-user] statefull vs. stateless images
We have been running our cluster using stateless images for over 6 years now.
For the most part, things are running great. There are two reasons for our
decision to run stateless:
1. our compute nodes originally did not have local hard drives
2. we envisioned a dynamic environment in which we would boot nodes frequently
with different images to satisfy different research needs
Today both of those points are invalid / do not apply. All of our compute nodes
come with hard drives, and we have never really booted cluster with any images
other than our "production" image. In addition, downtimes are really hard to
come by in our environment, and we treat our cluster as production system.
So, my question is, does it make sense to continue with stateless images, or
would we be better served with statefull (installed on local disk) images.
I question our today's method because:
1. stateless images are not trivial to build and update using genimage, putting
mellanox drivers, gpfs etc. We don't do it often enough so every time we have
to do it, we are re-inventing a wheel.
2. stateless images take up portion of compute node memory
Are there any downsides to running a 700+ node cluster using statefull images?
Like I said, we don't boot the cluster at all for many months at the time (we
get a single downtime during the year), and most of the packages outside of
normal RH installation are installed using postscripts.
Let me know your thoughts.
Thanks,
Damir
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user