Re: [uknof] NETCONF&ANG and device health stats

James Bensley Fri, 25 Sep 2015 01:49:15 -0700

On 23 September 2015 at 03:07, Rob Shakir <[email protected]> wrote:
> Hi James,
>
> First off, it’s really important to remember that NETCONF and YANG are just 
> tools - they don’t make up a whole NMS, which is what I think you are trying 
> to describe. YANG is a data modelling language that defines the schema for 
> data of a device’s management plane; and NETCONF is one of the protocols that 
> defines a set of RPCs to be able to interact with a device.
>
> NETCONF defines a set of RPCs such as ‘get’, ‘get-config’, ‘edit-config’... 
> These allow parts of the schema to be retrieved or edited. There are various 
> efforts to define YANG models - some which are vendor-specific, and some 
> which are intended to be vendor neutral.



Yeah, I'm reasonably aware of what NETCONF and YANG are and are not. I
wasn't intended to use it as an entire NMS, since I've been playing
with it I haven't found a way in which I can make use of it yet. The
problem I poorly communicated in previous emails is that
pushing-pulling config is just one step of making a change to the
network, and multiple pieces of network state that change (bringing up
a new peer for example changes [the number of peers in the AS, the
number of peers on that PE, the number of routes on that PE,
potentially the number of routing tables on that PE, free memory on
that PE, time to converge for that PE]).

I'm looking to build a tool that is vendor agnostic, and can perform
all the processes required without using many different technologies
at once (SNMP, expect, NETCONF & YANG, and bullshit like tftp or
http).

I don't see why all the processes I previously highlighted can't fall
under one XML-over-RPC based tool, which is what I want to write (and
probably open source), I think there are loads of networks out there
that would greatly benefit from that....Looks like there is hope, see
below....


> NETCONF does not define any RPCs to be able to do all of the things that you 
> mention, but yes, it can definitely be used to retrieve operational state 
> data. YANG includes a ‘config false’ statement that can be used on leaves to 
> show they are state data. The YANG models that OpenConfig has defined aim to 
> align state and configuration within the schema (see 
> draft-openconfig-netmod-opstate), such that it is simple operationally to be 
> able to distinguish various types of op-state (e.g., derived state, which 
> refers to the counters and state information derived from protocol 
> interactions, and the applied state - which shows the configuration that the 
> device has ingested). We have suggested additional RPCs within supporting 
> protocols that allow retrieval of *just* state information (e.g., 
> get-operational defined in the above draft).
>
> The existing OpenConfig models already add leaves for some of the values you 
> mention: 
> https://github.com/YangModels/yang/tree/master/experimental/openconfig


This a big part of the jigsaw I was missing. The two major hurdles I
have encountered so far with NETCONF & YANG are that (firstly) I can
only really push and pull config (loosely speaking), I can't check the
number of peers, or route counts per peers etc (as I said in my
previous emails, changing network state triggers a load of other
checks to be made, in particularly for the state changes to be
traceable etc).

The second is that most people seem to just paste genuine vendor
config into a NETCONF client that pushes that config out over the XML
RPC. So all they have gained things like data bases locking, syntax
checking, rollback's etc on devices that might not have had those
features built in already via the CLI (which they should have IMO).
Really I want to move away from that and probably the hardest goal I
would face is building a GUI in which one can browse (essentially the
YANG model) and tick BGP > New Peer, type in an IP and ASN, see a list
of existing policies and tick "generic private LINX peer filter" or
whatever they already have define and just fire up a new peer.

Only at this point could the application then begin to do things like
check the device has spare CPU cycles/memory, push and apply the
changes, check the peer has come up, oh no it's sending a full table
to us by accident, we've defined roll back actions such as maximum
route count which match and remove the peer configuration.

And the scope goes way beyond the above, this is one part of the
network operations puzzle that eliminates human error when making
changes to the network (mostly). Another part is trying to reduce
unforeseen network issues when making changes: I bring up a new
downstream customer peering who is announcing PI space to me, that is
a change process that can be automated, but in that automation I also
want to see that at each egress node we are advertising the PI space
to our peers/transits, I want to see if the customer is sending us the
routes with a community so we AS pre-pend because we aren't their
preferred transit provider, the routes are leaving our AS with the
prepends etc, are they announcing the PI space they said they would
and not one someone else’s by mistake? If the routes aren't in our
egress announcements there could be a problem?

This is all easily automatable if derived and applied network state is
available.

> In parallel with the work that is related to having the schema store the 
> state information (which could be polled by NETCONF/RESTCONF/...), OpenConfig 
> is also considering how one can subscribe to certain parts of the schema, 
> such that there is no need to poll particular information - and rather a 
> real-time element of the NMS can collect the information sent to it directly 
> from a device. Compiling this into the same YANG-defined schema, then allows 
> an NMS (or applications that interact with the NMS) to be able to work with 
> that data to implement pre-check, post-check etc. mechanisms. If you are 
> interested in this, check out the talks that Anees Shaikh and Josh George 
> have given at NANOG on what Google are working on in this area.

Bingo! Now looking them up!

> The way I would think of it is this: NETCONF/RESTCONF will help with the 
> application of configuration changes onto a device (pushing config); and 
> polling some information. The streamed telemetry protocols are somewhat more 
> in their infancy, so at the current time, you may need to glue existing data 
> sources back into a schema - but these will help with populating your view of 
> the state more efficiently than existing polled mechanisms. The OpenConfig 
> models provide a set of models that can be used to have an 
> operationally-useful way to be able to represent the two config and state 
> together.
>
> The thing that glues it all together and lets you define your pre-/post- 
> checks is the overall NMS software. A bunch of folks are working on systems 
> in this area.
>
>> What are others doing here to keep the whole process sane?
>
> The way that our system is currently working is that we have a split between 
> data collection and configuration management elements. The query 
> infrastructure can be used to combine the two into the single OpenConfig 
> schema (and other schemas that we are defining that abstract the 
> configuration from clients).
>
> The NMS layer provides the entire life-cycle that you mention: pre-check, 
> apply-change, post-check, on-demand checks and some other things. Our system 
> provides means to be able to define these functions that do this as part of 
> the network design. Parts of it are open source (e.g., the code that 
> generates Python bindings for YANG data models - 
> http://github.com/robshakir/pyangbind, or in your own system you could use an 
> alternative that Google have written for Go - 
> http://github.com/openconfig/goyang), other parts are not.

OK, so that is more or less what I had imagined given the infancy of
YANG at present.

> Unfortunately, the NETCONF support that is on existing devices is not great - 
> but we’re seeing both vendor-specific models, and vendor-neutral models start 
> to become supported. Juniper have publicly spoken about their aims to 
> implement OpenConfig - and other vendors are also working on implementations.
>
> To address the comment about having been able to do things with expect, the 
> advantages of doing this with YANG models (particularly OpenConfig) and a 
> protocol of your choice that can interact with the device directly is that 
> you get:
>         * a declarative API: the network element determines how to get from 
> state A to state B, rather than the imperative nature of the CLI.
>         * a schema that is understood by both the client and the NMS for both 
> config and state: where screen-scraping, there is no contract, and everything 
> may change and break your tools - then if the data model is described in 
> YANG, then you can be clear what to expect for various values.
>         * a combined way to relate configuration data and state: so it is not 
> a case of knowing that a certain neighbour corresponds to some particular 
> show commands, or an SNMP OID, but rather the schema follows conventions such 
> that these can be easily determined.

Yes exactly, totally agree and I think it's where everyone should be
heading. There doesn't seem to be any one tool that can do NMS
operations mixed with network state and management though (yet). So
I'm very keen to write something, as per my above comments, it seems
like not all the jigsaw pieces are quite available yet, but should be
soon. I will join the OpenConnect group and see what’s crackin' over
there.

> I’m happy to talk more about OpenConfig; or some of the work that we’ve been 
> doing in this area. We need to take steps forward with the management plane. 
> The current status-quo is simply unacceptable in terms of the 
> speed/complexity of interacting with the network.

Agreed, networks are so far behind server and application development
and automation in my opinion. Whilst I have seen a dozen presentations
from operators or hosting providers, content providers etc on how they
have automated zero-touch-provisioning or automated service
deployment, these are almost always bespoke systems they have written,
not a standards based, vendor agnostic, transactional systems, which
is where I desperately want to go to (and want the world to go).

Cheers,
James.

Re: [uknof] NETCONF&ANG and device health stats

Reply via email to