Re: [OPSAWG] Comments on Service Assurance for Intent-Based Networking Architecture (e.g. draft-claise-opsawg-service-assurance-architecture)

Benoit Claise Fri, 31 Jul 2020 03:43:43 -0700

Hi Alex,

Thanks for engaging.

Hi Benoit,
I have seen your presentations on Service Assurance for Intent-BasedNetworking Architecture and read your drafts with interest(draft-claise-opsawg-service-assurance-yang-05 anddraft-claise-opsawg-service-assurance-architecture-03). Interestingstuff on which I do have a couple of comments.
The basis for the drafts is in essence a proposal for Model-BasedReasoning, in which you capture dependencies between objects and makeinferences by traversing the corresponding graph. MBR based ondependency graphs allows to reason about the impact and propagation ofthe status or health of one object on the status or health ofdependent objects “downstream” from it. Likewise, traversing the samegraph in the opposite direction (from the “downstream” or dependentobjects) allows to identify potential root causes for symptomsobserved by those objects, although this seems to be not so much yourfocus.
While MBR as a concept makes sense and has a long tradition in networkmanagement, there are also a number of considerable issues with it,and I was wondering about your perspective and mitigation strategiesfor these. For one, their effectiveness depends on the model being“complete”. In most cases, there are myriads of interdependencieswhich are difficult to capture comprehensively. The model is stilluseful for many applications as a starting point, but rarely capturesthe full reality. As long as users are clear about that, this is notan issue.

Point taken about the myriads of interdependencies and graph completeness.

As you observe, even if the graph is not complete, this is useful.Especially when we can assure (networking) components within theassurance graph.That way, the graph will tell us where the problem is not, which isequally important as telling where the problem is/might be.... assumingwe have complete heuristics for that component assurance obviously ...which implies that the heuristics need to improve along the time.

However, the one thing where I have a bit of concern in your model isthat you use it to draw conclusions about the health of the dependentobjects (for example, your end-to-end service). It seems that aderived health score will be no substitute for monitoring the actualhealth, and should not lull users into a false sense of security thatas long as they monitor components of a system or service, that theydon’t need to be concerned with monitoring the system or service as awhole. In reality I believe the value (although there still is avalue) is more limited than that. I believe that this should beclearly acknowledged and discussed in the drafts.

This is the exact reason why I wrote in the slides: "This complementsthe end-to-end synthetic testing"Indeed, the way service assurance is usually done is with end to endprobing: OWAMP/TWAMP/IP SLA with delay, packet loss, jitterthreshold-based, etc. . When the SLA degrades, the end to end probingcan't really tell which components in the network degrades (granted,there are exceptions).The network is viewed as a black box. Combiningthe inferred health score from the assurance graph with the end-to-endprobing provides the required correlation to have more of a networkcrystal view

Point very well taken, "This complements the end-to-end synthetictesting" concept is not mentioned in the draft. I will add it. Thanks.

A second set of issues concerns the intensity of maintaining the graphand of continuously updating the dependencies. In a realistic systemyou will have many objects with even more interdependencies.Maintaining derived health state can become computationally veryexpensive, which suggests a number of mitigation strategies: for one,don’t continuously maintain this but compute this only “on demand”.

Yes. That's one way

Second, perhaps don’t maintain this on the server at all, at least tothe extent that you expect the server to be a networking device. Itseems much more feasible to perform these type of Model-BasedReasoning computations in an Operations Support System or applicationoutside the network, not within the network. However, it is not clearthat YANG models and Netconf/Restconf would be applied there. Itseems to me the drafts should add clarification on where those modelswould be expected to be deployed and how/would keep them updated. Asan OSS tool, your proposal makes sense, but trying to process this onnetworking devices strikes me as very heavy, in particular given thelimitations as per the earlier point. So, IMHO I think you may wantto consider adding an according section that discusses these aspectsin the draft, specifically the architecture draft.

The architecture, with the YANG module, is actually designed to coverdistributed graphs.We can stream all metrics (whether YANG leaf, MIB variable, CLI, syslog,what have you) to an OSS, sureHowever, I believe into data aggregation as we know that we're going toquickly reach the streaming capabilities limitations.And I also believe into each components being responsible for itsassurance, to the best of its knowledge.Hence the proposal to go via a SAIN agent, inside or outside a router,to send the inferred health score and symptoms to the OSS.

In the end, what do operational teams care about?

1. knowing that an interface, a router, part of the network worksfine ... until they tell me otherwise 2. collecting all the metrics in a big data lake to draw the sameor better conclusionIdeally we need both, but we face two schools here. I'm more of in theschool of providing information, as opposed to the much data. This wouldreduce the cost of managing networks.


Regards, Benoit

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Re: [OPSAWG] Comments on Service Assurance for Intent-Based Networking Architecture (e.g. draft-claise-opsawg-service-assurance-architecture)

Reply via email to