I think stateless makes a little less sense over time.

1)      Local boot storage is cheaper and more durable than it used to be, and 
this is only going to get more extreme

2)      Dynamism is probably better and more easily served by somethig like 
Singularity, which makes things easier for users to do their thing without the 
administrators having to accommodate.

3)      Mitigating drift can be done in other ways.  Stateless has 
traditionally had the side effect of mitigating accumulating ‘drift’ as people 
do things ad-hoc to OS images, by punishing those practices.  Strictly speaking 
the same discipline can be self-imposed without downside, it just takes some 
willpower.

From: Damir Krstic [mailto:damir.krs...@gmail.com]
Sent: Friday, January 13, 2017 9:20 AM
To: xCAT Users Mailing list
Subject: [xcat-user] statefull vs. stateless images

We have been running our cluster using stateless images for over 6 years now. 
For the most part, things are running great. There are two reasons for our 
decision to run stateless:
1. our compute nodes originally did not have local hard drives
2. we envisioned a dynamic environment in which we would boot nodes frequently 
with different images to satisfy different research needs

Today both of those points are invalid / do not apply. All of our compute nodes 
come with hard drives, and we have never really booted cluster with any images 
other than our "production" image. In addition, downtimes are really hard to 
come by in our environment, and we treat our cluster as production system.

So, my question is, does it make sense to continue with stateless images, or 
would we be better served with statefull (installed on local disk) images.

I question our today's method because:
1. stateless images are not trivial to build and update using genimage, putting 
mellanox drivers, gpfs etc. We don't do it often enough so every time we have 
to do it, we are re-inventing a wheel.
2. stateless images take up portion of compute node memory

Are there any downsides to running a 700+ node cluster using statefull images? 
Like I said, we don't boot the cluster at all for many months at the time (we 
get a single downtime during the year), and most of the packages outside of 
normal RH installation are installed using postscripts.

Let me know your thoughts.

Thanks,
Damir
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to