Benoit Claise <[email protected]> wrote:
    > Thanks for your review.
    > And sorry for the delay: I was not too sure how to react to this
    > review. Another review after WGLC, to be integrated in IETF LC?
    > Document

meh, sorry.

    > On 9/13/2022 12:45 AM, Michael Richardson wrote:
    >> I have read draft-ietf-opsawg-service-assurance-architecture at the 
request
    >> of a few people.  This is not part of any directorate review (that I
    >> remember, or that shows up in my review list).  If it's useful for me to 
plug
    >> this in somewhere, let me know.
    >>
    >> I find the document well written, and to me rather ambitious.
    >> That might be because my level of understanding of modern network 
management
    >> is poor.
    >>
    >> I found section 3.1.1. Circular Dependencies to be interesting, and I 
think
    >> telling.   As soon as I saw "DAG" in the previous section, I was all, 
"yeah, but..."
    >> I'm not convinced that the process described in 3.1.1 is something that a
    >> computer program can do, versus that it (the service and the components 
that
    >> build the service) has to designed to be cycle from from the beginning.
    >> It seems to me that this document either has to constrain what services 
can
    >> be built by deciding upon a canonical way to describe many things, or 
that
    >> different vendors will create interoperable models only by chance.

    > Typically, it's only when assurance graphs are combined that we might have
    > circular dependencies. So in practice, we don't believe we are going to 
see
    > many instances of those.

okay, that's reasonable.  It seems like a lot of text to deal with a problem
that won't occur very often.

    >> overlooked later on.  The broken thing never gets repaired, and then
    >> some other fault or maintenance causes an actual failure.

    > Actually, it depends on the intent.
    > If the intent is to get have a backup link all the time, then yes, the
    > service continue to operate with a lower score.

got it.

    >> b) components are marked for maintenance, which have service impacting
    >> effects, but during which, other components fail.  To make analogy,
    >> you don't care so much if your car steering system does not operate
    >> while the starter motor is not operational.  But, as soon as you fix the
    >> starter motor (taking hours to day), you find that you still can not
    >> go.   You could have fixed both systems in parallel/currently, if only
    >> you'd known.

    > There are two cases here.
    > 1. you knew (from the assurance graph) that car steering system did not
    > operate when going for maintenance for the starter motor.
    >     In such a case, you could be solving both in parallel during 
maintenance

    > 2. you don't know, and you will learn about the broken down car steering
    > system when back from the starter motor maintenance
    >     ... at the time of recomputing the assurance graph and looking at the
    > health of each subservice

Yes... so I guess I wonder how to always be in case 1.

    >> (c) is in many ways that the DAG *itself* might need to be updated.
    >> How do you transition from one dependancy DAG to another dependancy DAG?
    >> I guess that section 3.9 gets into this, but it seems rather weak.

    > Proposal:
    > 1. we need to add the concept that service depending on the 
under-maintenance
    > subservices will receive the "under maintenance" symptom and has to take 
into
    > account in his health computation. How? We don't want to in the specific 
of
    > health aggregation in this specification.

okay.  Where would that occur?  Or is it really vendor dependent?

    > 2. add some text that the DAG might have to recomputed after a subservice
    > coming out of maintenance.

Doesn't that go without saying?

    >> 3.8. Timing
    >> Starts talking about NTP, and synchronization.
    >> Then goes into garbage collection, and I think that maybe this 
transition in
    >> the text could be better presented.

    > You are right.
    > We propose to move the following text (which is not consequent enough to
    > deserve its own section) just before 3.1

    > The SAIN architecture requires time synchronization, with Network
    > Time Protocol (NTP) [RFC5905  
<https://datatracker.ietf.org/doc/html/rfc5905>] as a candidate, between all 
elements:
    > monitored entities, SAIN agents, Service orchestrator, the SAIN
    > collector, as well as the SAIN orchestrator.  This guarantees the
    > correlations of all symptoms in the system, correlated with the right
    > assurance graph version.

good.

    > And rename section 3.8 "Timing" to "Garbage Collection"
    >>
    >>
    >> I feel that this SAIN architecture is quite ambitious, and I'm not sure 
that
    >> there is enough here to actually create interoperable implementations.

    > My group created a prototype. I know of another one.
    > And there is an opensource implementation (presented by Prof Benoit 
Donnet in
    > the past).
    > The interop part will be with linking YANG modules, which we addressed 
with
    > the circular dependencies.

Cool.... i suggest an implementation experience section for the IESG review.
But, are these implementations involving multi-vendor systems under management?

--
Michael Richardson <[email protected]>, Sandelman Software Works
 -= IPv6 IoT consulting =-



Attachment: signature.asc
Description: PGP signature

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Reply via email to