Hi Michael,
See inline.
On 9/22/2022 2:10 PM, Michael Richardson wrote:
Benoit Claise<[email protected]> wrote:
> Thanks for your review.
> And sorry for the delay: I was not too sure how to react to this
> review. Another review after WGLC, to be integrated in IETF LC?
> Document
meh, sorry.
> On 9/13/2022 12:45 AM, Michael Richardson wrote:
>> I have read draft-ietf-opsawg-service-assurance-architecture at the
request
>> of a few people. This is not part of any directorate review (that I
>> remember, or that shows up in my review list). If it's useful for me
to plug
>> this in somewhere, let me know.
>>
>> I find the document well written, and to me rather ambitious.
>> That might be because my level of understanding of modern network
management
>> is poor.
>>
>> I found section 3.1.1. Circular Dependencies to be interesting, and I
think
>> telling. As soon as I saw "DAG" in the previous section, I was all, "yeah,
but..."
>> I'm not convinced that the process described in 3.1.1 is something that
a
>> computer program can do, versus that it (the service and the components
that
>> build the service) has to designed to be cycle from from the beginning.
>> It seems to me that this document either has to constrain what services
can
>> be built by deciding upon a canonical way to describe many things, or
that
>> different vendors will create interoperable models only by chance.
> Typically, it's only when assurance graphs are combined that we might
have
> circular dependencies. So in practice, we don't believe we are going to
see
> many instances of those.
okay, that's reasonable. It seems like a lot of text to deal with a problem
that won't occur very often.
I don't disagree but that specific point was provided as feedback.
>> overlooked later on. The broken thing never gets repaired, and then
>> some other fault or maintenance causes an actual failure.
> Actually, it depends on the intent.
> If the intent is to get have a backup link all the time, then yes, the
> service continue to operate with a lower score.
got it.
>> b) components are marked for maintenance, which have service impacting
>> effects, but during which, other components fail. To make analogy,
>> you don't care so much if your car steering system does not operate
>> while the starter motor is not operational. But, as soon as you fix the
>> starter motor (taking hours to day), you find that you still can not
>> go. You could have fixed both systems in parallel/currently, if only
>> you'd known.
> There are two cases here.
> 1. you knew (from the assurance graph) that car steering system did not
> operate when going for maintenance for the starter motor.
> In such a case, you could be solving both in parallel during
maintenance
> 2. you don't know, and you will learn about the broken down car steering
> system when back from the starter motor maintenance
> ... at the time of recomputing the assurance graph and looking at the
> health of each subservice
Yes... so I guess I wonder how to always be in case 1.
>> (c) is in many ways that the DAG *itself* might need to be updated.
>> How do you transition from one dependancy DAG to another dependancy DAG?
>> I guess that section 3.9 gets into this, but it seems rather weak.
> Proposal:
> 1. we need to add the concept that service depending on the
under-maintenance
> subservices will receive the "under maintenance" symptom and has to take
into
> account in his health computation. How? We don't want to in the specific
of
> health aggregation in this specification.
okay. Where would that occur?
In the SAIN collector (see figure 1), whose scope is not covered by this
spec.
Or is it really vendor dependent?
> 2. add some text that the DAG might have to recomputed after a subservice
> coming out of maintenance.
Doesn't that go without saying?
>> 3.8. Timing
>> Starts talking about NTP, and synchronization.
>> Then goes into garbage collection, and I think that maybe this
transition in
>> the text could be better presented.
> You are right.
> We propose to move the following text (which is not consequent enough to
> deserve its own section) just before 3.1
> The SAIN architecture requires time synchronization, with Network
> Time Protocol (NTP)
[RFC5905<https://datatracker.ietf.org/doc/html/rfc5905>] as a candidate, between
all elements:
> monitored entities, SAIN agents, Service orchestrator, the SAIN
> collector, as well as the SAIN orchestrator. This guarantees the
> correlations of all symptoms in the system, correlated with the right
> assurance graph version.
good.
> And rename section 3.8 "Timing" to "Garbage Collection"
>>
>>
>> I feel that this SAIN architecture is quite ambitious, and I'm not sure
that
>> there is enough here to actually create interoperable implementations.
> My group created a prototype. I know of another one.
> And there is an opensource implementation (presented by Prof Benoit
Donnet in
> the past).
> The interop part will be with linking YANG modules, which we addressed
with
> the circular dependencies.
Cool.... i suggest an implementation experience section for the IESG review.
If you speak about RFC 7942, it mentions:
We recommend that the Implementation Status section should be removed
from Internet-Drafts before they are published as RFCs.
So isn't sufficient to have this information in the write-up.
You can write down: "Huawei has a prototype implementation of this
architecture and specifically of the YANG module"
Regards, Benoit
But, are these implementations involving multi-vendor systems under management?
--
Michael Richardson<[email protected]>, Sandelman Software Works
-= IPv6 IoT consulting =-
_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg