Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08

Benoit Claise Thu, 22 Sep 2022 13:37:34 -0700

Hi Michael,

See inline.


On 9/22/2022 2:10 PM, Michael Richardson wrote:

Benoit Claise<[email protected]>  wrote:
     > Thanks for your review.
     > And sorry for the delay: I was not too sure how to react to this
     > review. Another review after WGLC, to be integrated in IETF LC?
     > Document

meh, sorry.

     > On 9/13/2022 12:45 AM, Michael Richardson wrote:
     >> I have read draft-ietf-opsawg-service-assurance-architecture at the 
request
     >> of a few people.  This is not part of any directorate review (that I
     >> remember, or that shows up in my review list).  If it's useful for me 
to plug
     >> this in somewhere, let me know.
     >>
     >> I find the document well written, and to me rather ambitious.
     >> That might be because my level of understanding of modern network 
management
     >> is poor.
     >>
     >> I found section 3.1.1. Circular Dependencies to be interesting, and I 
think
     >> telling.   As soon as I saw "DAG" in the previous section, I was all, "yeah, 
but..."
     >> I'm not convinced that the process described in 3.1.1 is something that 
a
     >> computer program can do, versus that it (the service and the components 
that
     >> build the service) has to designed to be cycle from from the beginning.
     >> It seems to me that this document either has to constrain what services 
can
     >> be built by deciding upon a canonical way to describe many things, or 
that
     >> different vendors will create interoperable models only by chance.

     > Typically, it's only when assurance graphs are combined that we might 
have
     > circular dependencies. So in practice, we don't believe we are going to 
see
     > many instances of those.

okay, that's reasonable.  It seems like a lot of text to deal with a problem
that won't occur very often.

I don't disagree but that specific point was provided as feedback.


     >> overlooked later on.  The broken thing never gets repaired, and then
     >> some other fault or maintenance causes an actual failure.

     > Actually, it depends on the intent.
     > If the intent is to get have a backup link all the time, then yes, the
     > service continue to operate with a lower score.

got it.

     >> b) components are marked for maintenance, which have service impacting
     >> effects, but during which, other components fail.  To make analogy,
     >> you don't care so much if your car steering system does not operate
     >> while the starter motor is not operational.  But, as soon as you fix the
     >> starter motor (taking hours to day), you find that you still can not
     >> go.   You could have fixed both systems in parallel/currently, if only
     >> you'd known.

     > There are two cases here.
     > 1. you knew (from the assurance graph) that car steering system did not
     > operate when going for maintenance for the starter motor.
     >     In such a case, you could be solving both in parallel during 
maintenance

     > 2. you don't know, and you will learn about the broken down car steering
     > system when back from the starter motor maintenance
     >     ... at the time of recomputing the assurance graph and looking at the
     > health of each subservice

Yes... so I guess I wonder how to always be in case 1.

     >> (c) is in many ways that the DAG *itself* might need to be updated.
     >> How do you transition from one dependancy DAG to another dependancy DAG?
     >> I guess that section 3.9 gets into this, but it seems rather weak.

     > Proposal:
     > 1. we need to add the concept that service depending on the 
under-maintenance
     > subservices will receive the "under maintenance" symptom and has to take 
into
     > account in his health computation. How? We don't want to in the specific 
of
     > health aggregation in this specification.

okay.  Where would that occur?

In the SAIN collector (see figure 1), whose scope is not covered by thisspec.

Or is it really vendor dependent?

     > 2. add some text that the DAG might have to recomputed after a subservice
     > coming out of maintenance.

Doesn't that go without saying?

     >> 3.8. Timing
     >> Starts talking about NTP, and synchronization.
     >> Then goes into garbage collection, and I think that maybe this 
transition in
     >> the text could be better presented.

     > You are right.
     > We propose to move the following text (which is not consequent enough to
     > deserve its own section) just before 3.1

     > The SAIN architecture requires time synchronization, with Network
     > Time Protocol (NTP) 
[RFC5905<https://datatracker.ietf.org/doc/html/rfc5905>] as a candidate, between 
all elements:
     > monitored entities, SAIN agents, Service orchestrator, the SAIN
     > collector, as well as the SAIN orchestrator.  This guarantees the
     > correlations of all symptoms in the system, correlated with the right
     > assurance graph version.

good.

     > And rename section 3.8 "Timing" to "Garbage Collection"
     >>
     >>
     >> I feel that this SAIN architecture is quite ambitious, and I'm not sure 
that
     >> there is enough here to actually create interoperable implementations.

     > My group created a prototype. I know of another one.
     > And there is an opensource implementation (presented by Prof Benoit 
Donnet in
     > the past).
     > The interop part will be with linking YANG modules, which we addressed 
with
     > the circular dependencies.

Cool.... i suggest an implementation experience section for the IESG review.

If you speak about RFC 7942, it mentions:

   We recommend that the Implementation Status section should be removed
   from Internet-Drafts before they are published as RFCs.


So isn't sufficient to have this information in the write-up.

You can write down: "Huawei has a prototype implementation of thisarchitecture and specifically of the YANG module"


Regards, Benoit

But, are these implementations involving multi-vendor systems under management?

--
Michael Richardson<[email protected]>, Sandelman Software Works
  -= IPv6 IoT consulting =-

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08

Reply via email to