[OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08

Michael Richardson Wed, 14 Sep 2022 08:03:15 -0700

I have read draft-ietf-opsawg-service-assurance-architecture at the request
of a few people.  This is not part of any directorate review (that I
remember, or that shows up in my review list).  If it's useful for me to plug
this in somewhere, let me know.


I find the document well written, and to me rather ambitious.
That might be because my level of understanding of modern network management
is poor.

I found section 3.1.1. Circular Dependencies to be interesting, and I think
telling.   As soon as I saw "DAG" in the previous section, I was all, "yeah, 
but..."
I'm not convinced that the process described in 3.1.1 is something that a
computer program can do, versus that it (the service and the components that
build the service) has to designed to be cycle from from the beginning.
It seems to me that this document either has to constrain what services can
be built by deciding upon a canonical way to describe many things, or that
different vendors will create interoperable models only by chance.

section 3.6. Handling Maintenance Windows
seems a bit light to me.
I think that there are three aspects which need to emphasized:
  a) maintenance windows where components are marked in maintenance, but
     that the service itself should continue to operate (with a lower score),
     because some redundancy takes over.
     A key issue here is sometimes this results in "boy-who-cried-wolf"
     situation, where the lower score and lack of resiliency is then
     overlooked later on.  The broken thing never gets repaired, and then
     some other fault or maintenance causes an actual failure.

  b) components are marked for maintenance, which have service impacting
     effects, but during which, other components fail.  To make analogy,
     you don't care so much if your car steering system does not operate
     while the starter motor is not operational.  But, as soon as you fix the
     starter motor (taking hours to day), you find that you still can not
     go.   You could have fixed both systems in parallel/currently, if only
     you'd known.

  c) as the example gives about an update to an device OS.  This sometimes
     comes with unintended (or poorly documented) side effects which cause
     other failures, or knock-on updates.  For instance, you upgrade the
     OS and then TLS 1.1 is disabled in favour of TLS 1.2 and TLS 1.3, but
     other components are in critical use, and have not yet been updated,
     and only TLS 1.1 was supported.

(c) is in many ways that the DAG *itself* might need to be updated.
How do you transition from one dependancy DAG to another dependancy DAG?
I guess that section 3.9 gets into this, but it seems rather weak.

3.8. Timing
Starts talking about NTP, and synchronization.
Then goes into garbage collection, and I think that maybe this transition in
the text could be better presented.


I feel that this SAIN architecture is quite ambitious, and I'm not sure that
there is enough here to actually create interoperable implementations.


-- 
Michael Richardson <[email protected]>, Sandelman Software Works
 -= IPv6 IoT consulting =-

signature.asc
Description: PGP signature

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

[OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08

Reply via email to