Thank you for your thoughts, Jan! I agree with the importance of the topics you raised, and I'd like to comment on them in the light of our project (and configuration management approach in general).

On 06/06/2018 08:26 PM, Jan Pokorný wrote:
On 07/06/18 02:19 +0200, Jan Pokorný wrote:
While I see why Ansible is compelling, I feel it's important to
challenge this trend of trying to bend/rebrand _machine-local
configuration management tool_ as _distributed system management tool_
(pacemaker is distributed application/framework of sorts), which Ansible
alone is _not_, as far as I know, hence the effort doesn't seem to be
100% sound (which really matters if reliability is the goal).

Once more, this has nothing to do with the announced project, it's
just the trending fuss on this topic that indicates me that people
independently, as they keenly invent their own wheel (here: Ansible
roles), get blind to the fallacy everything must work nicely with
multi machine shared-state scenarios like they are used to with
single host bootstrapping, without any shortcomings.

I can't entirely agree on this. The solution we're suggesting is built specifically to address this concern. In the taxonomy you linked, it would probably be type 2B, and here's why.

But there are, and precisely because not the optimal tool for the
task gets selected!  Just imagine what would happen if a single
machine got configured independently with multiple Ansible actors
(there may be mechanisms -- relatively easy within the same host --
that would prevent such interferences, but assume now they are not
strong enough).  What will happen?  Likely some mess-ups will occur as
glorified idempotence is hard to achieve atomically.  Voila, inflicted
race conditions, one by one, get exercised, until there's enough of
bad luck that the rule of idempotence gets broken, just because of
these processes emulating a schizophrenic (at the same time
multitasking) admin.  Ouch!

This situation is actually altering the rules of the game as we play. Configuration management is a technical solution, it was never meant to solve administrative (i.e. human-centered) problems. No atomicity will safeguard us from another admin deciding to reboot the hypervisor with my host. Idempotence is a relative concept, and it's relative to one person/entity. If I run the same playbook again, any time, any number of times, the result will be the same.

However, if another actor is involved, unsurprisingly, their mileage will vary, and so will mine. What happens if two admins add the same host to two Kubernetes/Heat/Ansible environments? That's the same situation. And I'm not even trying to solve this type of situation.

Now, reflect This to the situation with possibly concurrent
cluster configuration.  One cannot really expect the cluster
stack to be bullet-proof against these sorts of mishandling.
Single cluster administrator operating at a time?  Ideal!
Few administrators presumably with separate areas of
configuration interest?  Pacemaker is quite ready.
Cluster configuration randomly touched from random node
at random time (equivalent of said schizophrenic multitasking
administrator with a single host)?  Chances are off in
sufficiently long period when this happens.

The solution here is to break that randomness, configuration
is modified either:
1. from a single node at a time in the cluster (plus preferrably
    batching all required changes into a single request)
2, mutual time-critical exclusion of triggering the changes
    across the nodes
3. mutual locality-critical exclusion in the subject of the
    changes initiated from particular nodes

Putting 1. and 3. aside as not very interesting (1. means
a degenerate case with single point of failure, and 3. kills
the universality), what we get is really a dependency on some
kind of distributed lock and/or transactional system.
Well, we have just discovered that what we need to automate our
predestined configuration in the cluster reliably and without
hurting universality (like "breaking the node symmetry") is
said distributed system management ("orchestration") tool.
Has Ansible these capabilities?

Correct, all these capabilities are already there, let me explain.

Firstly, as you pointed out in #1, the CIB configuration section is run on a single node, Ansible's `run_once` makes sure of that. Additionally all required changes *are* in fact batched into a single reqest: as I mentioned, changes are made to an XML dump, which gets verified and pushed to the cluster using the vendor-approved method (cibadmin --replace).

Secondly, as you suggest in #2, CIB schema has this feature built in, it's `admin_epoch` property. The cluster will reject XML older than the one it runs. And our role makes sure it gets incremented whenever changes are made. Therefore, if other (valid) changes have been made, the playbook will fail until you rerun it without conflicts. Pretty much like Git requiring you to rebase/merge before you push.

Now, one idea there might be to make the tools like pcs compensate
for these shortcomings of machine-local configuration management ones.
Sounds good, right?  Absolutely not, more like a bad joke!
Because what else can it be, the development of orchestration-like
features (with all the complexities solved once in corosync/DLM
already; relaxing non-dependency on the very subject of management
may not be wise) on top of regular high-level cluster management tool
only[*] to bridge the gap in something that is simply subpar fit
in distributed environments to begin with?

In my understanding pcs was designed to make manual configuration more user-friendly, not as an orchestration tool.

Anyhow, I do appreciate your opinion and agree with the overall idea on orchestration/configuration problem. Thank you for the insights.
--
Best regards,
Styopa Semenukha,
Senior IT Analyst at Development Gateway.
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to