Benoit Claise <[email protected]> wrote: > Thanks for your review. > And sorry for the delay: I was not too sure how to react to this > review. Another review after WGLC, to be integrated in IETF LC? > Document
meh, sorry.
> On 9/13/2022 12:45 AM, Michael Richardson wrote:
>> I have read draft-ietf-opsawg-service-assurance-architecture at the
request
>> of a few people. This is not part of any directorate review (that I
>> remember, or that shows up in my review list). If it's useful for me to
plug
>> this in somewhere, let me know.
>>
>> I find the document well written, and to me rather ambitious.
>> That might be because my level of understanding of modern network
management
>> is poor.
>>
>> I found section 3.1.1. Circular Dependencies to be interesting, and I
think
>> telling. As soon as I saw "DAG" in the previous section, I was all,
"yeah, but..."
>> I'm not convinced that the process described in 3.1.1 is something that a
>> computer program can do, versus that it (the service and the components
that
>> build the service) has to designed to be cycle from from the beginning.
>> It seems to me that this document either has to constrain what services
can
>> be built by deciding upon a canonical way to describe many things, or
that
>> different vendors will create interoperable models only by chance.
> Typically, it's only when assurance graphs are combined that we might have
> circular dependencies. So in practice, we don't believe we are going to
see
> many instances of those.
okay, that's reasonable. It seems like a lot of text to deal with a problem
that won't occur very often.
>> overlooked later on. The broken thing never gets repaired, and then
>> some other fault or maintenance causes an actual failure.
> Actually, it depends on the intent.
> If the intent is to get have a backup link all the time, then yes, the
> service continue to operate with a lower score.
got it.
>> b) components are marked for maintenance, which have service impacting
>> effects, but during which, other components fail. To make analogy,
>> you don't care so much if your car steering system does not operate
>> while the starter motor is not operational. But, as soon as you fix the
>> starter motor (taking hours to day), you find that you still can not
>> go. You could have fixed both systems in parallel/currently, if only
>> you'd known.
> There are two cases here.
> 1. you knew (from the assurance graph) that car steering system did not
> operate when going for maintenance for the starter motor.
> In such a case, you could be solving both in parallel during
maintenance
> 2. you don't know, and you will learn about the broken down car steering
> system when back from the starter motor maintenance
> ... at the time of recomputing the assurance graph and looking at the
> health of each subservice
Yes... so I guess I wonder how to always be in case 1.
>> (c) is in many ways that the DAG *itself* might need to be updated.
>> How do you transition from one dependancy DAG to another dependancy DAG?
>> I guess that section 3.9 gets into this, but it seems rather weak.
> Proposal:
> 1. we need to add the concept that service depending on the
under-maintenance
> subservices will receive the "under maintenance" symptom and has to take
into
> account in his health computation. How? We don't want to in the specific
of
> health aggregation in this specification.
okay. Where would that occur? Or is it really vendor dependent?
> 2. add some text that the DAG might have to recomputed after a subservice
> coming out of maintenance.
Doesn't that go without saying?
>> 3.8. Timing
>> Starts talking about NTP, and synchronization.
>> Then goes into garbage collection, and I think that maybe this
transition in
>> the text could be better presented.
> You are right.
> We propose to move the following text (which is not consequent enough to
> deserve its own section) just before 3.1
> The SAIN architecture requires time synchronization, with Network
> Time Protocol (NTP) [RFC5905
<https://datatracker.ietf.org/doc/html/rfc5905>] as a candidate, between all
elements:
> monitored entities, SAIN agents, Service orchestrator, the SAIN
> collector, as well as the SAIN orchestrator. This guarantees the
> correlations of all symptoms in the system, correlated with the right
> assurance graph version.
good.
> And rename section 3.8 "Timing" to "Garbage Collection"
>>
>>
>> I feel that this SAIN architecture is quite ambitious, and I'm not sure
that
>> there is enough here to actually create interoperable implementations.
> My group created a prototype. I know of another one.
> And there is an opensource implementation (presented by Prof Benoit
Donnet in
> the past).
> The interop part will be with linking YANG modules, which we addressed
with
> the circular dependencies.
Cool.... i suggest an implementation experience section for the IESG review.
But, are these implementations involving multi-vendor systems under management?
--
Michael Richardson <[email protected]>, Sandelman Software Works
-= IPv6 IoT consulting =-
signature.asc
Description: PGP signature
_______________________________________________ OPSAWG mailing list [email protected] https://www.ietf.org/mailman/listinfo/opsawg
