Re: [OPSAWG] Comments on Service Assurance for Intent-Based Networking Architecture (e.g. draft-claise-opsawg-service-assurance-architecture)

Benoit Claise Tue, 04 Aug 2020 05:39:46 -0700

Thanks Alex,

We'll make sure to introduce the required text in the next draft versions.


Regards, Benoit

Hi Benoit,
thanks for the response. By and large we are on the same page and Isupport this work. And as you know clearly I am of the school whobelieves in exception-driven management and providing actionableinformation, not raw data.
Anyway, as mentioned there should perhaps be greater emphasis on thevalue in maintaining a dependency graph in general, and explaining howit can complement / aid operational tasks from troubleshooting toimpact analysis. It would be good to add some bits on how and whereto instrument this effectively (not necessarily all pushed onto deviceagents; there will be also a role for controllers etc in this) Iremain sceptical regarding the specific use case of continuousmaintaining of a synthetically derived health score but am lookingforward to progression of this work further iterations of the drafts.
--- Alex

*From:* Benoit Claise <[email protected]>
*Sent:* Friday, July 31, 2020 3:42 AM
*To:* Alexander Clemm <[email protected]>;[email protected]
*Cc:* [email protected]; [email protected]
*Subject:* Re: Comments on Service Assurance for Intent-BasedNetworking Architecture (e.g.draft-claise-opsawg-service-assurance-architecture)
Hi Alex,

Thanks for engaging.

    Hi Benoit,

    I have seen your presentations on Service Assurance for
    Intent-Based Networking Architecture and read your drafts with
    interest (draft-claise-opsawg-service-assurance-yang-05 and
    draft-claise-opsawg-service-assurance-architecture-03).
    Interesting stuff on which I do have a couple of comments.

    The basis for the drafts is in essence a proposal for Model-Based
    Reasoning, in which you capture dependencies between objects and
    make inferences by traversing the corresponding graph.  MBR based
    on dependency graphs allows to reason about the impact and
    propagation of the status or health of one object on the status or
    health of dependent objects “downstream” from it.  Likewise,
    traversing the same graph in the opposite direction (from the
    “downstream” or dependent objects) allows to identify potential
    root causes for symptoms observed by those objects, although this
    seems to be not so much your focus.

    While MBR as a concept makes sense and has a long tradition in
    network management, there are also a number of considerable issues
    with it, and I was wondering about your perspective and mitigation
    strategies for these.  For one, their effectiveness depends on the
    model being “complete”.  In most cases, there are myriads of
    interdependencies which are difficult to capture comprehensively. 
    The model is still useful for many applications as a starting
    point, but rarely captures the full reality.  As long as users are
    clear about that, this is not an issue.

Point taken about the myriads of interdependencies and graph completeness.
As you observe, even if the graph is not complete, this is useful.Especially when we can assure (networking) components within theassurance graph.That way, the graph will tell us where the problem is not, which isequally important as telling where the problem is/might be....assuming we have complete heuristics for that component assuranceobviously ... which implies that the heuristics need to improve alongthe time.
    However, the one thing where I have a bit of concern in your model
    is that you use it to draw conclusions about the health of the
    dependent objects (for example, your end-to-end service).  It
    seems that a derived health score will be no substitute for
    monitoring the actual health, and should not lull users into a
    false sense of security that as long as they monitor components of
    a system or service, that they don’t need to be concerned with
    monitoring the system or service as a whole.  In reality I believe
    the value (although there still is a value) is more limited than
    that.  I believe that this should be clearly acknowledged and
    discussed in the drafts.
This is the exact reason why I wrote in the slides: "This complementsthe end-to-end synthetic testing"Indeed, the way service assurance is usually done is with end to endprobing: OWAMP/TWAMP/IP SLA with delay, packet loss, jitterthreshold-based, etc. . When the SLA degrades, the end to end probingcan't really tell which components in the network degrades (granted,there are exceptions).The network is viewed as a black box. Combiningthe inferred health score from the assurance graph with the end-to-endprobing provides the required correlation to have more of a networkcrystal view
Point very well taken, "This complements the end-to-end synthetictesting" concept is not mentioned in the draft. I will add it. Thanks.
    A second set of issues concerns the intensity of maintaining the
    graph and of continuously updating the dependencies.  In a
    realistic system you will have many objects with even more
    interdependencies. Maintaining derived health state can become
    computationally very expensive, which suggests a number of
    mitigation strategies:  for one, don’t continuously maintain this
    but compute this only “on demand”.

Yes. That's one way

    Second, perhaps don’t maintain this on the server at all, at least
    to the extent that you expect the server to be a networking
    device.  It seems much more feasible to perform these type of
    Model-Based Reasoning computations in an Operations Support System
    or application outside the network, not within the network.
    However, it is not clear that YANG models and Netconf/Restconf
    would be applied there.  It seems to me the drafts should add
    clarification on where those models would be expected to be
    deployed and how/would keep them updated.  As an OSS tool, your
    proposal makes sense, but trying to process this on networking
    devices strikes me as very heavy, in particular given the
    limitations as per the earlier point.   So, IMHO I think you may
    want to consider adding an according section that discusses these
    aspects in the draft, specifically the architecture draft.
The architecture, with the YANG module, is actually designed to coverdistributed graphs.We can stream all metrics (whether YANG leaf, MIB variable, CLI,syslog, what have you) to an OSS, sureHowever, I believe into data aggregation as we know that we're goingto quickly reach the streaming capabilities limitations.And I also believe into each components being responsible for itsassurance, to the best of its knowledge.Hence the proposal to go via a SAIN agent, inside or outside a router,to send the inferred health score and symptoms to the OSS.
In the end, what do operational teams care about?
1. knowing that an interface, a router, part of the network worksfine ... until they tell me otherwise 2. collecting all the metrics in a big data lake to draw the sameor better conclusionIdeally we need both, but we face two schools here. I'm more of in theschool of providing information, as opposed to the much data. Thiswould reduce the cost of managing networks.
Regards, Benoit

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Re: [OPSAWG] Comments on Service Assurance for Intent-Based Networking Architecture (e.g. draft-claise-opsawg-service-assurance-architecture)

Reply via email to