Hi Haoyu, Thanks for the quick updates. I will check the diffs and see if I have any questions remaining.
Regards, Rob > -----Original Message----- > From: Haoyu Song <haoyu.s...@futurewei.com> > Sent: 08 October 2021 00:15 > To: Rob Wilton (rwilton) <rwil...@cisco.com>; draft-ietf-opsawg- > ntf....@ietf.org > Cc: opsawg@ietf.org; 'opsawg-chairs' <opsawg-cha...@ietf.org> > Subject: RE: AD review of draft-ietf-opsawg-ntf-07 [2] > > Hi Rob, > > We have updated the draft according to your review suggestions and > uploaded the -08 version. In the new revision we believe all your > suggestions/questions have been addressed. Please let me know if you have > further questions. Thank you very much! > > Best regards, > Haoyu > > > ------------------------------------------------- > A new version of I-D, draft-ietf-opsawg-ntf-08.txt has been successfully > submitted by Haoyu Song and posted to the IETF repository. > > Name: draft-ietf-opsawg-ntf > Revision: 08 > Title: Network Telemetry Framework > Document date: 2021-10-07 > Group: opsawg > Pages: 40 > URL: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww > .ietf.org%2Farchive%2Fid%2Fdraft-ietf-opsawg-ntf- > 08.txt&data=04%7C01%7Chaoyu.song%40futurewei.com%7C96249f77c > e0246132c2608d989e79553%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7 > C1%7C637692450027508042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&am > p;sdata=fm%2FeutvtbKzZN7c%2BvZzlzmZzSWQs0I52sn68EQ1bSv0%3D& > reserved=0 > Status: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatat > racker.ietf.org%2Fdoc%2Fdraft-ietf-opsawg- > ntf%2F&data=04%7C01%7Chaoyu.song%40futurewei.com%7C96249f77 > ce0246132c2608d989e79553%7C0fee8ff2a3b240189c753a1d5591fedc%7C1% > 7C1%7C637692450027508042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4 > wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&a > mp;sdata=mPDw6Gz2JqqJ%2F6X0ISjEH5MH1nL%2Bgn5MK4VnbaBAfRs%3D& > amp;reserved=0 > Htmlized: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatat > racker.ietf.org%2Fdoc%2Fhtml%2Fdraft-ietf-opsawg- > ntf&data=04%7C01%7Chaoyu.song%40futurewei.com%7C96249f77ce02 > 46132c2608d989e79553%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1 > %7C637692450027508042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj > AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000& > sdata=x8mxaK3UugiiTtDDX1YCrs3a9%2FjhdUXBPMetNuoR1SM%3D&res > erved=0 > Diff: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww > .ietf.org%2Frfcdiff%3Furl2%3Ddraft-ietf-opsawg-ntf- > 08&data=04%7C01%7Chaoyu.song%40futurewei.com%7C96249f77ce02 > 46132c2608d989e79553%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1 > %7C637692450027508042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj > AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000& > sdata=3QV9pT%2Fzs5xj6WxMLqIwGr2%2F4cD7xqclE3uznclsZfA%3D&re > served=0 > > > -----Original Message----- > From: Haoyu Song > Sent: Wednesday, October 6, 2021 9:14 AM > To: Rob Wilton (rwilton) <rwil...@cisco.com>; draft-ietf-opsawg- > ntf....@ietf.org > Cc: opsawg@ietf.org > Subject: RE: AD review of draft-ietf-opsawg-ntf-07 [2] > > Hi Rob, > > Thank you very much for the review! We'll update the draft as you > suggested. > > Best regards, > Haoyu > > -----Original Message----- > From: Rob Wilton (rwilton) <rwil...@cisco.com> > Sent: Wednesday, October 6, 2021 3:55 AM > To: draft-ietf-opsawg-ntf....@ietf.org > Cc: opsawg@ietf.org > Subject: RE: AD review of draft-ietf-opsawg-ntf-07 [2] > > Sigh, this also appears to be truncated in my email client. > > To be sure that you see all the comments (i.e., to the end of the document), > please either see the previous attachment. The full email can also be seen in > the archives at > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail > archive.ietf.org%2Farch%2Fmsg%2Fopsawg%2FWDnVtM_vLm15X28OTEwI9 > Q6gfx0%2F&data=04%7C01%7Chaoyu.song%40futurewei.com%7Cf1e79 > 80d22be45a356e608d988b7d5ba%7C0fee8ff2a3b240189c753a1d5591fedc%7 > C1%7C0%7C637691145441218654%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100 > 0&sdata=d3NH7iwGu4T99Y%2Fwh9jft0oWofQeKyfWhcuBCQSZcJM%3D > &reserved=0 > > Regards, > Rob > > > -----Original Message----- > From: Rob Wilton (rwilton) <rwil...@cisco.com> > Sent: 06 October 2021 11:48 > To: draft-ietf-opsawg-ntf....@ietf.org > Cc: opsawg@ietf.org > Subject: AD review of draft-ietf-opsawg-ntf-07 [2] > > Hi, > > > > Here is my belated AD review of draft-ietf-opsawg-ntf-07.txt. [Text file with > comments attached in case this also gets truncated.] > > > > I would like to thank you for the effort that you have put into this document, > and apologise for my long delay in reviewing it. > > > > Broadly, I think that this is a good and useful framework, but in some of the > latter parts of the document it seems to give prominence to protocols that I > don't think have IETF consensus behind them yet (particularly DNP). I have > flagged specific comments in comments inline within the document, but I > think that the document will have been accuracy/longevity if text about the > potential technologies is mostly kept to the appendices. > > > > There were quite a lot of cases where the text doesn't scan, or read easily, > particularly in the latter sections of this document, although I acknowledge > that none of the authors appear to be native English speakers. Ideally, these > sorts of issues would have been highlighted and addressed during WG LC. > Although the RFC editor will improve the language of the documents, making > the improvements now before IESG review will aid its passage, and hopefully > result in a better document when it is published. I have flagged and > proposed alternative text/grammar where possible. Once you have made > the markups and resolved the issues/questions that I have raised then I can > run it through a grammar checking tool (Lar's will run an equivalent tool > during IESG review anyway ...) > > > > All of my comments are directly inline, please search for "RW" or "RW:" > > > > > > > > > > OPSAWG H. Song > > Internet-Draft Futurewei > > Intended status: Informational F. Qin > > Expires: August 23, 2021 China Mobile > > P. Martinez-Julia > > NICT > > L. Ciavaglia > > Nokia > > A. Wang > > China Telecom > > February 19, 2021 > > > > > > Network Telemetry Framework > > draft-ietf-opsawg-ntf-07 > > > > Abstract > > > > Network telemetry is a technology for gaining network insight and > > facilitating efficient and automated network management. It > > encompasses various techniques for remote data generation, > > collection, correlation, and consumption. This document describes an > > architectural framework for network telemetry, motivated by > > challenges that are encountered as part of the operation of networks > > and by the requirements that ensue. Network telemetry, as > > necessitated by best industry practices, covers technologies and > > protocols that extend beyond conventional network Operations, > > > > Administration, and Management (OAM). The presented network > > telemetry framework promises flexibility, scalability, accuracy, > > coverage, and performance. In addition, it facilitates the > > implementation of automated control loops to address both today's and > > tomorrow's network operational needs. This document clarifies the > > terminologies and classifies the modules and components of a network > > telemetry system from several different perspectives. The framework > > and taxonomy help to set a common ground for the collection of > > related work and provide guidance for related technique and standard > > developments. > > > > RW: > > I would suggest condensing the abstract to the following, and move the other > text to the introduction if it is not already covered there. > > > > Network telemetry is a technology for gaining network insight and > > facilitating efficient and automated network management. It > > encompasses various techniques for remote data generation, > > collection, correlation, and consumption. This document describes an > > architectural framework for network telemetry, motivated by > > challenges that are encountered as part of the operation of networks > > and by the requirements that ensue. This document clarifies the > > terminologies and classifies the modules and components of a network > > telemetry system from several different perspectives. The framework > > and taxonomy help to set a common ground for the collection of > > related work and provide guidance for related technique and standard > > developments. > > > > > > Status of This Memo > > > > This Internet-Draft is submitted in full conformance with the > > provisions of BCP 78 and BCP 79. > > > > Internet-Drafts are working documents of the Internet Engineering > > Task Force (IETF). Note that other groups may also distribute > > working documents as Internet-Drafts. The list of current Internet- > > Drafts is at > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatat > racker.ietf.org%2Fdrafts%2Fcurrent%2F&data=04%7C01%7Chaoyu.song > %40futurewei.com%7Cf1e7980d22be45a356e608d988b7d5ba%7C0fee8ff2a3 > b240189c753a1d5591fedc%7C1%7C0%7C637691145441218654%7CUnknown > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > wiLCJXVCI6Mn0%3D%7C1000&sdata=4B6oa1Ks5lxCrKsVA33csv8LE2rTL1 > nZmfTlAv9n9ww%3D&reserved=0. > > > > > > > > > > Song, et al. Expires August 23, 2021 [Page 1] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > Internet-Drafts are draft documents valid for a maximum of six months > > and may be updated, replaced, or obsoleted by other documents at any > > time. It is inappropriate to use Internet-Drafts as reference > > material or to cite them other than as "work in progress." > > > > This Internet-Draft will expire on August 23, 2021. > > > > Copyright Notice > > > > Copyright (c) 2021 IETF Trust and the persons identified as the > > document authors. All rights reserved. > > > > This document is subject to BCP 78 and the IETF Trust's Legal > > Provisions Relating to IETF Documents > > > (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftrus > tee.ietf.org%2Flicense- > info&data=04%7C01%7Chaoyu.song%40futurewei.com%7Cf1e7980d22 > be45a356e608d988b7d5ba%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7 > C0%7C637691145441218654%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&am > p;sdata=6bgdcWR1Sp3ry4Xg6iJN79hoSxXhzT2FvtcqMXUnmGs%3D&rese > rved=0) in effect on the date of > > publication of this document. Please review these documents > > carefully, as they describe your rights and restrictions with respect > > to this document. Code Components extracted from this document must > > include Simplified BSD License text as described in Section 4.e of > > the Trust Legal Provisions and are provided without warranty as > > described in the Simplified BSD License. > > > > Table of Contents > > > > 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 > > 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 4 > > 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 > > 3.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 > > 3.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 7 > > 3.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 > > 3.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 > > 4. The Necessity of a Network Telemetry Framework . . . . . . . 12 > > 5. Network Telemetry Framework . . . . . . . . . . . . . . . . . 13 > > 5.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 14 > > 5.1.1. Management Plane Telemetry . . . . . . . . . . . . . 17 > > 5.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 17 > > 5.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 18 > > 5.1.4. External Data Telemetry . . . . . . . . . . . . . . . 20 > > 5.2. Second Level Function Components . . . . . . . . . . . . 21 > > 5.3. Data Acquisition Mechanism and Type Abstraction . . . . . 22 > > 5.4. Mapping Existing Mechanisms into the Framework . . . . . 24 > > 6. Evolution of Network Telemetry Applications . . . . . . . . . 25 > > 7. Security Considerations . . . . . . . . . . . . . . . . . . . 26 > > 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 > > 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 27 > > 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 > > 11. Informative References . . . . . . . . . . . . . . . . . . . 28 > > Appendix A. A Survey on Existing Network Telemetry Techniques . 32 > > > > > > > > Song, et al. Expires August 23, 2021 [Page 2] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 32 > > A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 32 > > A.1.2. gRPC Network Management Interface . . . . . . . . . . 32 > > A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 33 > > A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 33 > > A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 33 > > A.3.1. The Alternate Marking (AM) technology . . . . . . . . 33 > > A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 34 > > A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 35 > > A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 35 > > A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 35 > > A.4. External Data and Event Telemetry . . . . . . . . . . . . 35 > > A.4.1. Sources of External Events . . . . . . . . . . . . . 36 > > A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 37 > > Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 > > > > 1. Introduction > > > > Network visibility is the ability of management tools to see the > > state and behavior of a network, which is essential for successful > > network operation. Network Telemetry revolves around network data > > that can help provide insights about the current state of the > > network, including network devices, forwarding, control, and > > management planes, and that can be generated and obtained through a > > variety of techniques, including but not limited to network > > instrumentation and measurements, and that can be processed for > > purposes ranging from service assurance to network security using a > > wide variety of techniques including machine learning, data analysis, > > and correlation. In this document, Network Telemetry refer to both > > the data itself (i.e., "Network Telemetry Data"), and the techniques > > and processes used to generate, export, collect, and consume that > > data for use by potentially automated management applications. > > Network telemetry extends beyond the conventional network Operations, > > Administration, and Management (OAM) techniques and expects to > > support better flexibility, scalability, accuracy, coverage, and > > performance. > > > > RW: I suggest 'historical' rather than 'conventional' > > > > > > However, the term of network telemetry lacks a solid and unambiguous > > definition. The scope and coverage of it cause confusion and > > misunderstandings. It is beneficial to clarify the concept and > > provide a clear architectural framework for network telemetry, so we > > can articulate the technical field, and better align the related > > techniques and standard works. > > > > RW: Rather than term of, perhaps 'the term "network telemetry" lacks an > > unambiguous definition'. > > > > > > To fulfill such an undertaking, we first discuss some key > > characteristics of network telemetry which set a clear distinction > > from the conventional network OAM and show that some conventional > OAM > > technologies can be considered a subset of the network telemetry > > > > > > > > Song, et al. Expires August 23, 2021 [Page 3] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > technologies. We then provide an architectural framework for network > > telemetry which includes four modules, each concerned with a > > different category of telemetry data and corresponding procedures. > > All the modules are internally structured in the same way, including > > components that allow to configure data sources with regards to what > > data to generate and how to make that available to client > > applications, components that instrument the underlying data sources, > > and components that perform the actual rendering, encoding, and > > exporting of the generated data. We show how the network telemetry > > framework can benefit the current and future network operations. > > Based on the distinction of modules and function components, we can > > map the existing and emerging techniques and protocols into the > > framework. The framework can also simplify the tasks for designing, > > maintaining, and understanding a network telemetry system. At last, > > we outline the evolution stages of the network telemetry system and > > discuss the potential security concerns. > > > > The purpose of the framework and taxonomy is to set a common ground > > for the collection of related work and provide guidance for future > > technique and standard developments. To the best of our knowledge, > > this document is the first such effort for network telemetry in > > industry standards organizations. > > > > > > 2. Glossary > > > > Before further discussion, we list some key terminology and acronyms > > used in this documents. We make an intended differentiation between > > the terms of network telemetry and OAM. However, it should be > > understood that there is not a hard-line distinction between the two > > concepts. Rather, network telemetry is considered as the extension > > of OAM. It covers all the existing OAM protocols but puts more > > emphasis on the newer and emerging techniques and protocols > > concerning all aspects of network data from acquisition to > > consumption. > > > > > > RW: > > Nit: "this documents." -> "this document." > > Nit: "as an extension" rather than "as the extension". > > > > AI: Artificial Intelligence. In network domain, AI refers to the > > machine-learning based technologies for automated network > > operation and other tasks. > > > > AM: Alternate Marking, a flow performance measurement method, > > specified in [RFC8321]. > > > > BMP: BGP Monitoring Protocol, specified in [RFC7854]. > > > > DNP: Dynamic Network Probe, referring to programmable in-network > > sensors for network monitoring and measurement. > > > > > > > > > > > > Song, et al. Expires August 23, 2021 [Page 4] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > DPI: Deep Packet Inspection, referring to the techniques that > > examines packet beyond packet L3/L4 headers. > > > > gNMI: gRPC Network Management Interface, a network management > > protocol from OpenConfig Operator Working Group, mainly > > contributed by Google. See [gnmi] for details. > > > > gRPC: gRPC Remote Procedure Call, a open source high performance RPC > > framework that gNMI is based on. See [grpc] for details. > > > > IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. > > > > IOAM: In-situ OAM, a dataplane on-path telemetry technique. > > > > NETCONF: Network Configuration Protocol, specified in [RFC6241]. > > > > NetFlow: A Cisco protocol for flow record collecting, described in > > [RFC3594]. > > > > Network Telemetry: The process and instrumentation for acquiring and > > utilizing network data remotely for network monitoring and > > operation. A general term for a large set of network visibility > > techniques and protocols, concerning aspects like data generation, > > collection, correlation, and consumption. Network telemetry > > addresses the current network operation issues and enables smooth > > evolution toward future intent-driven autonomous networks. > > > > NMS: Network Management System, referring to applications that allow > > network administrators manage a network. > > > > RW: referring to => refers to applications that allow network administrators > to manage a network. > > > > > > > > OAM: Operations, Administration, and Maintenance. A group of > > network management functions that provide network fault > > indication, fault localization, performance information, and data > > and diagnosis functions. Most conventional network monitoring > > techniques and protocols belong to network OAM. > > > > PBT: Postcard-Based Telemetry, a dataplane on-path telemetry > > technique. > > > > SMIv2 Structure of Management Information Version 2, specified in > > [RFC2578]. > > > > RW: > > Is SMIv2 a better reference than MIBs, that readers are more likely to be > familiar with? > > > > > > SNMP: Simple Network Management Protocol. Version 1 and 2 are > > specified in [RFC1157] and [RFC3416], respectively. > > > > YANG: The abbreviation of "Yet Another Next Generation". YANG is a > > data modeling language for the definition of data sent over > > > > RW: > > Nit: Please drop the first sentence, and add a reference to RFC 7950. > > > > > > > > Song, et al. Expires August 23, 2021 [Page 5] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > network management protocols such as the NETCONF and RESTCONF. > > YANG is defined in [RFC6020]. > > > > YANG ECA A YANG model for Event-Condition-Action policies, defined > > in [I-D.wwx-netmod-event-yang]. > > > > YANG PUSH: A method to subscribe pushed data from remote YANG > > datastore on network devices. Details are specified in [RFC8641] > > and [RFC8639]. > > > > RW: > > Perhaps borrow from the abstract in RFC 8641. > > "A mechanism that allows subscriber applications to request a > > stream of updates from a YANG datastore on a network device". Details > are ... > > > > > > 3. Background > > > > The term "big data" is used to describe the extremely large volume of > > data sets that can be analyzed computationally to reveal patterns, > > trends, and associations. Networks are undoubtedly a source of big > > data because of their scale and the volume of network traffic they > > forward. It is easy to see that network operations can benefit from > > network big data. > > > > RW: > > Also need to consider privacy. > > > > I think that we need to be careful not to imply that the intention here is to > read/snoop on the data being carried over the network rather than gather > insights into flows > > > > > > > > Today one can access advanced big data analytics capability through a > > plethora of commercial and open source platforms (e.g., Apache > > Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine > > learning). Thanks to the advance of computing and storage > > technologies, network big data analytics gives network operators an > > opportunity to gain network insights and move towards network > > autonomy. Some operators start to explore the application of > > Artificial Intelligence (AI) to make sense of network data. Software > > tools can use the network data to detect and react on network faults, > > anomalies, and policy violations, as well as predicting future > > events. In turn, the network policy updates for planning, intrusion > > prevention, optimization, and self-healing may be applied. > > > > It is conceivable that an autonomic network [RFC7575] is the logical > > next step for network evolution following Software Defined Network > > (SDN), aiming to reduce (or even eliminate) human labor, make more > > efficient use of network resources, and provide better services more > > aligned with customer requirements. Intent-based Networking (IBN) > > [I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility > > and telemetry data in order to ensure that the network is behaving as > > intended. Although it takes time to reach the ultimate goal, the > > journey has started nevertheless. > > RW: > > It would be helpful for the text to link autonomic networking and Intent > based networking, perhaps: > > The related technique of Intent-based Networking [...] requires ... > > > > RW: > > Not sure that the last sentence of the paragraph is required. > > > > > > However, while the data processing capability is improved and > > applications are hungry for more data, the networks lag behind in > > extracting and translating network data into useful and actionable > > information in efficient ways. The system bottleneck is shifting > > from data consumption to data supply. Both the number of network > > nodes and the traffic bandwidth keep increasing at a fast pace. The > > > > > > > > Song, et al. Expires August 23, 2021 [Page 6] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > network configuration and policy change at smaller time slots than > > before. More subtle events and fine-grained data through all network > > planes need to be captured and exported in real time. In a nutshell, > > it is a challenge to get enough high-quality data out of the network > > in a manner that is efficient, timely, and flexible. Therefore, we > > need to survey the existing technologies and protocols and identify > > any potential gaps. > > > > In the remainder of this section, first we clarify the scope of > > network data (i.e., telemetry data) concerned in the context. Then, > > we discuss several key use cases for today's and future network > > operations. Next, we show why the current network OAM techniques and > > protocols are insufficient for these use cases. The discussion > > underlines the need of new methods, techniques, and protocols which > > we assign under the umbrella term - Network Telemetry. > > > > RW: > > We should also include the possibilty of extending existing protocols, > methods, techniques. > > > > > > 3.1. Telemetry Data Coverage > > > > Any information that can be extracted from networks (including data > > plane, control plane, and management plane) and used to gain > > visibility or as basis for actions is considered telemetry data. It > > includes statistics, event records and logs, snapshots of state, > > configuration data, etc. It also covers the outputs of any active > > and passive measurements [RFC7799]. Specially, raw data can be > > processed in-network before being sent to a data consumer. Such > > processed data is also considered telemetry data. A classification > > of telemetry data is provided in Section 5. > > > > RW: > > Specially - I would expand this. Perhaps: "In some cases, raw data is > processed before being sent .." > > We should also discuss the quality of data, i.e., less, higher quality data > may > be better than lots of low quality data. > > > > > > 3.2. Use Cases > > > > The following set of use cases is essential for network operations. > > While the list is by no means exhaustive, it is enough to highlight > > the requirements for data velocity, variety, volume, and veracity in > > networks. > > > > o Security: Network intrusion detection and prevention systems need > > to monitor network traffic and activities and act upon anomalies. > > Given increasingly sophisticated attack vector coupled with > > increasingly severe consequences of security breaches, new tools > > and techniques need to be developed, relying on wider and deeper > > visibility into networks. > > > > RW: > > I agree with this, but it might be good to emphasize that the goal is > > to get to a place where this can be done without any, or only minimal, > > human intervention. > > > > > > o Policy and Intent Compliance: Network policies are the rules that > > constraint the services for network access, provide service > > differentiation, or enforce specific treatment on the traffic. > > For example, a service function chain is a policy that requires > > the selected flows to pass through a set of ordered network > > functions. Intent, as defined in > > > > RW: > > constraint => constrain > > > > > > Song, et al. Expires August 23, 2021 [Page 7] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational > > goal that a network should meet and outcomes that a network is > > supposed to deliver, defined in a declarative manner without > > specifying how to achieve or implement them. An intent requires a > > complex translation and mapping process before being applied on > > networks. While a policy or an intent is enforced, the compliance > > needs to be verified and monitored continuously, relying on > > visibility that is provided through network telemetry data, and > > any violation needs to be reported immediately. > > > > RW: > > Does it not also rely on visibility of the network to potentially modify > > the mapping to ensure that the intent remains in force? > > > > o SLA Compliance: A Service-Level Agreement (SLA) defines the level > > of service a user expects from a network operator, which include > > the metrics for the service measurement and remedy/penalty > > procedures when the service level misses the agreement. Users > > need to check if they get the service as promised and network > > operators need to evaluate how they can deliver the services that > > can meet the SLA based on realtime network telemetry data, > > including data from network measurements. > > > > o Root Cause Analysis: Any network failure can be the effect of a > > sequence of chained events. Troubleshooting and recovery require > > quick identification of the root cause of any observable issues. > > However, the root cause is not always straightforward to identify, > > especially when the failure is sporadic and the number of event > > messages, both related and unrelated to the same cause, is > > overwhelming. While machine learning technologies can be used for > > root cause analysis, it up to the network to sense and provide the > > relevant data to feed into machine learning applications. > > > > RW: > > In these sorts of scenarios, I would expect additional detailed diagnostics > information to be requested from the device to figure out the root cause. Or > specifically, I think that this would contain data that wouldn't normally be > exported via telemetry. > > > > > > o Network Optimization: This covers all short-term and long-term > > network optimization techniques, including load balancing, Traffic > > Engineering (TE), and network planning. Network operators are > > motivated to optimize their network utilization and differentiate > > services for better Return On Investment (ROI) or lower Capital > > Expenditures (CAPEX). The first step is to know the real-time > > network conditions before applying policies for traffic > > manipulation. In some cases, micro-bursts need to be detected in > > a very short time-frame so that fine-grained traffic control can > > be applied to avoid network congestion. Long-term planning of > > network capacity and topology requires analysis of real-world > > network telemetry data that is obtained over long periods of time. > > > > o Event Tracking and Prediction: The visibility into traffic path > > and performance is critical for services and applications that > > rely on healthy network operation. Numerous related network > > events are of interest to network operators. For example, Network > > operators want to learn where and why packets are dropped for an > > application flow. They also want to be warned of issues in > > > > > > > > Song, et al. Expires August 23, 2021 [Page 8] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > advance so proactive actions can be taken to avoid catastrophic > > consequences. > > > > 3.3. Challenges > > > > For a long time, network operators have relied upon SNMP [RFC3416], > > Command-Line Interface (CLI), or Syslog to monitor the network. Some > > other OAM techniques as described in [RFC7276] are also used to > > facilitate network troubleshooting. These conventional techniques > > are not sufficient to support the above use cases for the following > > reasons: > > > > o Most use cases need to continuously monitor the network and > > dynamically refine the data collection in real-time. The poll- > > based low-frequency data collection is ill-suited for these > > applications. Subscription-based streaming data directly pushed > > from the data source (e.g., the forwarding chip) is preferred to > > provide enough data quantity and precision at scale. > > > > o Comprehensive data is needed from packet processing engine to > > traffic manager, from line cards to main control board, from user > > flows to control protocol packets, from device configurations to > > operations, and from physical layer to application layer. > > Conventional OAM only covers a narrow range of data (e.g., SNMP > > only handles data from the Management Information Base (MIB)). > > Traditional network devices cannot provide all the necessary > > probes. More open and programmable network devices are therefore > > needed. > > > > o Many application scenarios need to correlate network-wide data > > from multiple sources (i.e., from distributed network devices, > > different components of a network device, or different network > > planes). A piecemeal solution is often lacking the capability to > > consolidate the data from multiple sources. The composition of a > > complete solution, as partly proposed by Autonomic Resource > > Control Architecture(ARCA) > > [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and > > guided by a comprehensive framework. > > > > o Some of the conventional OAM techniques (e.g., CLI and Syslog) > > lack a formal data model. The unstructured data hinder the tool > > automation and application extensibility. Standardized data > > models are essential to support the programmable networks. > > > > o Although some conventional OAM techniques support data push (e.g., > > SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data > > are limited to only predefined management plane warnings (e.g., > > SNMP Trap) or sampled user packets (e.g., sFlow). Network > > > > > > > > Song, et al. Expires August 23, 2021 [Page 9] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > operators require the data with arbitrary source, granularity, and > > precision which are beyond the capability of the existing > > techniques. > > > > o The conventional passive measurement techniques can either consume > > excessive network resources and render excessive redundant data, > > or lead to inaccurate results; on the other hand, the conventional > > active measurement techniques can interfere with the user traffic > > and their results are indirect. Techniques that can collect > > direct and on-demand data from user traffic are more favorable. > > > > These challenges were addressed by newer standards and techniques > > (e.g., IPFIX/Netflow, PSAMP, IOAM, and YANG-Push) and more are > > emerging. These standards and techniques need to be recognized and > > accommodated in a new framework. > > > > 3.4. Network Telemetry > > > > Network telemetry has emerged as a mainstream technical term to refer > > to the network data collection and consumption techniques. Several > > network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and > > gRPC [grpc]) have been widely deployed. Network telemetry allows > > separate entities to acquire data from network devices so that data > > can be visualized and analyzed to support network monitoring and > > operation. Network telemetry covers the conventional network OAM and > > has a wider scope. It is expected that network telemetry can provide > > the necessary network insight for autonomous networks and address the > > shortcomings of conventional OAM techniques. > > > > Network telemetry usually assumes machines as data consumers rather > > than human operators. Hence, the network telemetry can directly > > trigger the automated network operation, while in contrast some > > conventional OAM tools are designed and used to help human operators > > to monitor and diagnose the networks and guide manual network > > operations. Such a proposition leads to very different techniques. > > > > Although new network telemetry techniques are emerging and subject to > > continuous evolution, several characteristics of network telemetry > > have been well accepted. Note that network telemetry is intended to > > be an umbrella term covering a wide spectrum of techniques, so the > > following characteristics are not expected to be held by every > > specific technique. > > > > o Push and Streaming: Instead of polling data from network devices, > > telemetry collectors subscribe to streaming data pushed from data > > sources in network devices. > > > > > > > > > > > > Song, et al. Expires August 23, 2021 [Page 10] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > o Volume and Velocity: The telemetry data is intended to be consumed > > by machines rather than by human being. Therefore, the data > > volume can be huge and the processing is optimized for the needs > > of automation in realtime. > > > > o Normalization and Unification: Telemetry aims to address the > > overall network automation needs. Efforts are made to normalize > > the data representation and unify the protocols, so to simplify > > data analysis and provide integrated analysis across heterogeneous > > devices and data sources across a network. > > > > o Model-based: The telemetry data is modeled in advance which allows > > applications to configure and consume data with ease. > > > > o Data Fusion: The data for a single application can come from > > multiple data sources (e.g., cross-domain, cross-device, and > > cross-layer) and needs to be correlated to take effect. > > > > o Dynamic and Interactive: Since the network telemetry means to be > > used in a closed control loop for network automation, it needs to > > run continuously and adapt to the dynamic and interactive queries > > from the network operation controller. > > > > In addition, an ideal network telemetry solution may also have the > > following features or properties: > > > > o In-Network Customization: The data that is generated can be > > customized in network at run-time to cater to the specific need of > > applications. This needs the support of a programmable data plane > > which allows probes with custom functions to be deployed at > > flexible locations. > > > > o In-Network Data Aggregation and Correlation: Network devices and > > aggregation points can work out which events and what data needs > > to be stored, reported, or discarded thus reducing the load on the > > central collection and processing points while still ensuring that > > the right information is ready to be processed in a timely way. > > > > o In-Network Processing: Sometimes it is not necessary or feasible > > to gather all information to a central point to be processed and > > acted upon. It is possible for the data processing to be done in > > network, allowing reactive actions to be taken locally. > > > > o Direct Data Plane Export: The data originated from the data plane > > forwarding chips can be directly exported to the data consumer for > > efficiency, especially when the data bandwidth is large and the > > real-time processing is required. > > > > > > > > > > Song, et al. Expires August 23, 2021 [Page 11] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > o In-band Data Collection: In addition to the passive and active > > data collection approaches, the new hybrid approach allows to > > directly collect data for any target flow on its entire forwarding > > path [I-D.song-opsawg-ifit-framework]. > > > > It is worth noting that a network telemetry system should not be > > intrusive to normal network operations by avoiding the pitfall of the > > "observer effect". That is, it should not change the network > > behavior and affect the forwarding performance. Otherwise, the whole > > purpose of network telemetry is compromised. > > > > Although in many cases a system for network telemetry involves a > > remote data collecting and consuming entity, it is important to > > understand that there are no inherent assumptions about how a system > > should be architected. Telemetry data producers and consumers can > > work in distributed or peer-to-peer fashions rather than assuming a > > centralized data consuming entity. In such cases, a network node can > > be the direct consumer of telemetry data from other nodes. > > > > 4. The Necessity of a Network Telemetry Framework > > > > RW: I think that the structure of the document might be better if this was a > section 3.5 of the background rather than it's own top level section? > > > > Network data analytics and machine-learning technologies are applied > > for network operation automation, relying on abundant and coherent > > data from networks. Data acquisition that is limited to a single > > source and static in nature will in many cases not be sufficient to > > meet an application's telemetry data needs. As a result, multiple > > data sources, involving a variety of techniques and standards, will > > need to be integrated. It is desirable to have a framework that > > classifies and organizes different telemetry data source and types, > > defines different components of a network telemetry system and their > > interactions, and helps coordinate and integrate multiple telemetry > > approaches across layers. This allows flexible combinations of data > > for different applications, while normalizing and simplifying > > interfaces. In detail, such a framework would benefit application > > development for the following reasons: > > > > o Future networks, autonomous or otherwise, depend on holistic and > > comprehensive network visibility. All the use cases and > > applications are better to be supported uniformly and coherently > > under a single intelligent agent using an integrated, converged > > mechanism and common telemetry data representations wherever > > feasible. Therefore, the protocols and mechanisms should be > > consolidated into a minimum yet comprehensive set. A telemetry > > framework can help to normalize the technique developments. > > > > o Network visibility presents multiple viewpoints. For example, the > > device viewpoint takes the network infrastructure as the > > monitoring object from which the network topology and device > > > > > > > > Song, et al. Expires August 23, 2021 [Page 12] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > status can be acquired; the traffic viewpoint takes the flows or > > packets as the monitoring object from which the traffic quality > > and path can be acquired. An application may need to switch its > > viewpoint during operation. It may also need to correlate a > > service and its impact on user experience to acquire the > > comprehensive information. > > > > o Applications require network telemetry to be elastic in order to > > make efficient use of network resources and reduce the impact of > > processing related to network telemetry on network performance. > > For example, routine network monitoring should cover the entire > > network with a low data sampling rate. Only when issues arise or > > critical trends emerge should telemetry data source be modified > > and telemetry data rates boosted as needed. > > > > o Efficient data fusion is critical for applications to reduce the > > overall quantity of data and improve the accuracy of analysis. > > > > A telemetry framework collects together all of the telemetry-related > > works from different sources and working groups within IETF. This > > makes it possible to assemble a comprehensive network telemetry > > system and to avoid repetitious or redundant work. The framework > > should cover the concepts and components from the standardization > > perspective. This document describes the modules which make up a > > network telemetry framework and decomposes the telemetry system into > > a set of distinct components that existing and future work can easily > > map to. > > > > 5. Network Telemetry Framework > > > > The top level network telemetry framework partitions the network > > telemetry into four modules based on the telemetry data object source > > and represents their relationship. At the next level, the framework > > decomposes each module into separate components. Each of the modules > > follows the same underlying structure, with one component dedicated > > to the configuration of data subscriptions and data sources, a second > > component dedicated to encoding and exporting data, and a third > > component instrumenting the generation of telemetry related to the > > underlying resources. Throughout the framework, the same set of > > abstract data acquiring mechanisms and data types are applied. The > > two-level architecture with the uniform data abstraction helps > > accurately pinpoint a protocol or technique to its position in a > > network telemetry system or disaggregate a network telemetry system > > into manageable parts. > > > > > > RW: Relationship of telemetry data vs get requests. I.e., isn't telemtry just > push rather than pulling data. > > > > > > > > > > Song, et al. Expires August 23, 2021 [Page 13] > > > > > Internet-Draft Network Telemetry Framework February 2021 > > > > > > 5.1. Top Level Modules > > > > Telemetry can be applied on the forwarding plane, the control plane, > > and the management plane in a network, as well as other sources out > > of the network, as shown in Figure 1. Therefore, we categorize the > > network telemetry into four distinct modules with each having its own > > interface to Network Operation Applications. > > > > +------------------------------+ > > | | > > | Network Operation |<-------+ > > | Applications | | > > | | | > > +------------------------------+ | > > ^ ^ ^ | > > | | | | > > V | V V > > +-----------|---+--------------+ +-----------+ > > | | | | | | > > | Control Pl|ane| | | External | > > | Telemetry | <---> | | Data and | > > | | | | | Event | > > | ^ V | Management | | Telemetry | > > +------|--------+ Plane | | | > > | V | Telemetry | +-----------+ > > | Forwarding | | > > | Plane <---> | > > | Telemetry | | > > | | | > > +---------------+--------------+ > > > > Figure 1: Modules in Layer Category of NTF > > > > RW: > > In this diagram, for me at least, I think that it would more natural to have > Management Plane on the left, and Control/ Forwarding Plane on the right. > > > > The rationale of this partition lies in the different telemetry data > > objects which result in different data source and export locations. > > Such differences have profound implications on in-network data > > programming and processing capability, data encoding and transport > > protocol, and required data bandwidth and latency. > > > > RW: > > Data can be sent directly, or proxied via the control and management planes _______________________________________________ OPSAWG mailing list OPSAWG@ietf.org https://www.ietf.org/mailman/listinfo/opsawg