Hi Alex,

Thank you very much for your detailed review and comments! Please see inline 
for my response (marked with [HS]). I have updated the draft accordingly, 
please let me know if your suggestions have been addressed. 

Best regards,
Haoyu  

-----Original Message-----
From: OPSAWG <[email protected]> On Behalf Of Alexander L Clemm
Sent: Thursday, September 17, 2020 4:25 PM
To: [email protected]
Subject: [OPSAWG] Review comments (Re: I-D Action: draft-ietf-opsawg-ntf-03.txt)

Hello Haoyu and draft authors,

I have reviewed draft-ietf-opsawg-ntf-03, Network Telemetry Framework, and a 
number of comments and suggestions for your consideration for further 
improvements of the document.  I am aware this is a bit lengthy, but hopefully 
you will find at least some of it useful.  FWIW, here goes (in sequential order 
in the document, not in order of relevance of the
comment):

- Intoduction (p3):

As a general comment, I noticed that the document tends to conflate (1) 
telemetry data, (2) modules of a framework architecture to generate, collect, 
and process telemetry data, as well as (3) functionality of applications or use 
cases that leverage such framework architecture.  It would be useful to clearly 
distinguish between these concepts and call out their relationship here. 

An example is the sentence "We show how network telemetry can meet ...
network operation requirements, and the challenges each telemetry module is 
facing".  It is not clear at this point what "network telemetry"
really refers to:  is it network telemetry data (but that does not meet network 
operation requirements by itself, although it contributes to a solution), or is 
the framework, but then what does "telemetry module"
refer to which would presumably be part of the framework, not an independent 
entity whose challenges are addressed through the framework. 

[HS] I've modified the introduction to make it clearer. 

- Motivation (p4):

For intent, may want to add a reference to 
draft-irtf-nmrg-ibn-concepts-definitions

[HS] Reference is added.

I am not sure if "actionable information" (and the activities used to translate 
"network data" into such information) is part of telemetry. To me, "network 
data" referred to here is really the telemetry data. Sure, this data will get 
processed, aggregated, abstracted, etc, but telemetry is really the "raw data" 
that fuels all of those activities. I think what is needed is a clear 
definition or introduction of what "telemetry data" entails, and what will be 
covered by the framework - just the framework to generate and collect that 
data, or any other applications "on top" that process that data further and use 
it for different purposes (which would make this more of a service assurance 
framework, not merely a telemetry framework IMHO).  Either way, I think it 
would be good to frame the scope more clearly. 

[HS] We consider the telemetry data include the processed data, as long as the 
processing is done in network and the results are delivered to the data 
consumer. It is true that the telemetry framework only covers the data 
collection part. However, the data collection can also involves a control loop. 
ie., new data collecting decision is based on previous telemetry data analysis 
results (we call it interactive telemetry or level 2 telemetry). I have made 
the scope more explicit in the draft. 

- Section 2.1 (p5):

There is an inline definition of intent that equates it with a policy. This is 
not consistent with the definition in other places.  Please refer to the 
definition e.g. in draft-irtf-nmrg-ibn-concepts-definitions

[HS] The definition is modified to be inline with the referenced document. 

For use cases, I think the two most important ones are missing; I would suggest 
adding these and leading with these, actually.  This concerns Security (e.g., 
intrusion detection systems analyzing telemetry data to detect suspicious 
activities and traffic), as well as Monitoring.  These are also arguably the 
most important use cases for e.g. Netflow/IPFIX today; it is not clear if flow 
records qualify as telemetry data in your definition but the document hints at 
them in several places. 

[HS] This use case is added as the first one. 

- Section 2.x

In general, what is missing is a section 2.x that gives a brief overview of 
what is considered as "telemetry data".  This is left to the imagination of the 
reader.  Just statistics and snapshots of state?  Or also config data?  Flow 
records?  What about event records and logs? Measurements?  Packets (including 
sampled ones) stored/copied for analysis?  Control packets?  All of the above, 
or if not, what are things that would not be considered "telemetry data"? 

[HS] A subsection is added to clarify the meaning of telemetry data in this 
document.

- Section 2.2

The state of the art includes a lot more than SNMP, CLI, syslog.  At a minimum, 
you need to mention flow records here.  Possibly also measurements, YANG-Push, 
etc etc. 

[HS] This section means to talk about the plain old OAM technique, and motivate 
why newer techniques are needed. All the others are actually considered to 
belong to the network telemetry techniques, which are the focus of this 
document. I added a sentence to clarify this point.  

- Section 2.3 (p8)
Update reference to YANG-Push (RFC 8641+8639). 

[HS] reference updated

- Section 2.4
Missing here is mention of flow information export (Netflow/IPFIX).
 
It is not clear what the document is doing here.  In some places it sounds as 
if it argues that other types of telemetry (which ones?) or other protocols / 
techniques (which ones/ for what needs?), but why - I thought the purpose was 
to define a framework that describes what pieces are needed, where different 
pieces fit, and how they are expected to interwork, not so much criticize the 
current state of the art.  To analyze the state of the art, important pieces 
are missing and not mentioned - for example, I am missing a mention of 
measurement, including e.g. OWAMP/TWAMP and RFC 6812 (currently missing 
entirely from the references, but widely deployed in the industry so should be 
added). 

[HS] What we argue here is that the old "OAM" cannot cover the new requirements 
so we introduce "network telemetry"  to represent the newer protocols and 
techniques. In general, network telemetry covers the OAM, but OAM is only a 
subset of network telemetry. The OWAMP/TWAMP and RFC6812 are all "network 
telemetry". I have added references to them.

It is not clear if In-network processing and action should be included here.  
See also my earlier comment.  This looks like scope creep to include not just 
telemetry but service assurance functionality that would operate on top of it.

[HS] As mentioned before, in-network processing is a part of network telemetry 
(actually I also think it the most interesting part which is less covered so 
far).  But action may not belong to telemetry. I deleted the "action" in the 
text.

The mention of SDN and centralized processing is a bit confusing, its purpose 
in this context not clear. 

[HS] I have reworded the description to avoid mentioning SDN.

- Section 3

It would be good to distinguish here between the need for a telemetry framework 
vs the need for telemetry data.  The need for the latter is clear; the need for 
the former should be explained as well.

[HS] This section means to be dedicated to discuss the need for a telemetry 
framework. I modified the test to clarify this point. 

- Section 4.1

Not clear on sub-pub vs pub-sub.  May need an editorial scrub. 

[HS] The text is modified and extended to make the definition clearer.

It is not clear why you mention on-request queries.  Do you consider the 
ability to request any type of management data as part of the telemetry 
framework?  If so, where do you draw the line between a telemetry framework and 
a broader management framework (the telemetry framework becomes fairly 
all-encompassing at that point)? 

[HS] We consider the ability to request device management data as a part of the 
telemetry framework (i.e., management plane telemetry). The telemetry framework 
should be a part of a broader management framework at the network level. 

I am not sure I would categorize complex data that has been synthesized and 
processed using complex algorithms as "telemetry data".  This goes again to the 
question of what is considered telemetry data.  Is it the raw data generated by 
the network (that's where I would draw the line), or does it include any data 
that can be derived from that data? If it's the latter, there is a slipper 
slope - you may need to include also data derived from correlation with service 
and other data, learned models, etc etc - before you know it it will encompass 
any data, information, knowledge involving a network.  

[HS] If the data is generated and processed in network, we consider the data as 
telemetry data. The data collector and analyzer may receive and work on 
telemetry data from multiple sources. The data generated here are not 
considered telemetry data, although the data collector and analyzer function 
may still be a part of the telemetry system.

- Figure 1 (p 13) 

I don't think the data type relationships as depicted are correct. Simple data 
could be streamed in its own right, and its generation also triggered through 
events.  I think what triggers the generation/sending of data is independent of 
the type of data itself.  Really, you might have simple data and complex data 
(arguably, complex data might not be telemetry data, but ok), and then 
different ways to trigger generation of that data: As part of an event, as part 
of a subscription (streaming), when some condition is met, etc etc. 

[HS] I updated the figure to better reflect their relationship.

- Section 4.2.1

There is no section 4.2.2.  Having a single subsection looks a bit strange.  
Suggest to either remove 4.2.1 as a separate subsection, or to elevate it to a 
second-level section, or to add a 4.2.2. 

[HS] I removed the subsection level as suggested.

- Section 4.2.1.3.1

Your discussion of methods active, passive, and hybrid covers very different 
types of information - anything from sampled packet to flow records to 
measurements to OAM.  I think it would be good to differentiate by type of 
information.  Also, please consider mentioning RFC 6812 (IPSLA) in addition to 
OWAMP/TWAMP. 

[HS] The information type can be used as another axis for technique 
classification, which is mentioned in the last point.

- Section 4.3 and Figure 4

While per se there is nothing wrong with what you describe here, and perhaps it 
is best to keep things simple and to the basics as there will be myriads of 
implementation variations, I think expectations for what the framework actually 
entails should be framed a bit better earlier in the document.  After all this 
leadup, finally seeing the proposed framework appears a bit anti-climactic. 
This pretty much describes best current practice today, pretty much describing 
a generic management agent on a node will provide.  It does not cover aspects 
such as end-to-end measurements (covering multiple nodes), any security 
components (signing of data to prevent tampering, perhaps), integration with 
the real resources and data sources (this starts at the "data object", what 
about collecting data across multiple line card etc etc). At a minimum, it 
would be good to state what it is in scope and out of scope.   

[HS] We adjusted the order slightly.  The frameworks follows a two-level 
architecture. The first level is a module for each plane, and the second level 
is five components in each module. The data acquiring mechanism and data type 
are described as abstractions to be used by the framework. 

In the tables of the section, as Netconf and YANG are mentioned, SMIv2 should 
probably be mentioned and referenced along with SNMP as well. 

[HS] Reference to SMIv2 is added.

- Section 5

The difference betwen level 1 (dynamic) and level 2 (interactive) telemetry is 
not very clear.  Any telemetry data can be used in a closed loop; it is not 
clear why this is called level 3 (you can use level 0 telemetry data for plenty 
of closed loops). 

[HS] Level 2 is more about automatic real-time change of telemetry tasks. A 
higher level is built upon a lower level. The text is modified.

- Section 6

For the security considerations, there are a number of additional possible 
telemetry attack vectors that could be mentioned here.  E.g., attacks aiming at 
generating telemetry data to exhaust network resources as well as resources on 
the node, attacks aimed at falsifying results and tampering with telemetry, 
swamping of receivers (for streaming data).  

[HS] The cases are added in the first paragraph.

--- Alex

On 4/13/2020 11:59 AM, [email protected] wrote:
> A New Internet-Draft is available from the on-line Internet-Drafts 
> directories.
> This draft is a work item of the Operations and Management Area Working Group 
> WG of the IETF.
>
>         Title           : Network Telemetry Framework
>         Authors         : Haoyu Song
>                           Fengwei Qin
>                           Pedro Martinez-Julia
>                           Laurent Ciavaglia
>                           Aijun Wang
>       Filename        : draft-ietf-opsawg-ntf-03.txt
>       Pages           : 34
>       Date            : 2020-04-13
>
> Abstract:
>    Network telemetry is the technology for gaining network insight and
>    facilitating efficient and automated network management.  It engages
>    various techniques for remote data collection, correlation, and
>    consumption.  This document provides an architectural framework for
>    network telemetry, motivated by the network operation challenges and
>    requirements.  As evidenced by some key characteristics and industry
>    practices, network telemetry covers technologies and protocols beyond
>    the conventional network Operations, Administration, and Management
>    (OAM).  It promises better flexibility, scalability, accuracy,
>    coverage, and performance and allows automated control loops to suit
>    both today's and tomorrow's network operation.  This document
>    clarifies the terminologies and classifies the modules and components
>    of a network telemetry system from several different perspectives.
>    The framework and taxonomy help to set a common ground for the
>    collection of related work and provide guidance for related technique
>    and standard developments.
>
>
> The IETF datatracker status page for this draft is:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata
> tracker.ietf.org%2Fdoc%2Fdraft-ietf-opsawg-ntf%2F&amp;data=02%7C01%7Ch
> aoyu.song%40futurewei.com%7Cab54d86e03644f4a445108d85b60f614%7C0fee8ff
> 2a3b240189c753a1d5591fedc%7C1%7C1%7C637359819297440306&amp;sdata=ceMPp
> dfZVdU9ES9dy9SbIdAeLQz6ihziEwpCOqo9E20%3D&amp;reserved=0
>
> There are also htmlized versions available at:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftool
> s.ietf.org%2Fhtml%2Fdraft-ietf-opsawg-ntf-03&amp;data=02%7C01%7Chaoyu.
> song%40futurewei.com%7Cab54d86e03644f4a445108d85b60f614%7C0fee8ff2a3b2
> 40189c753a1d5591fedc%7C1%7C1%7C637359819297440306&amp;sdata=1FdHXLJtfQ
> DZnPiZwpRAXAMzwtbxLp63aNuKxJUL%2BI0%3D&amp;reserved=0
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata
> tracker.ietf.org%2Fdoc%2Fhtml%2Fdraft-ietf-opsawg-ntf-03&amp;data=02%7
> C01%7Chaoyu.song%40futurewei.com%7Cab54d86e03644f4a445108d85b60f614%7C
> 0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637359819297440306&amp;sdat
> a=QVHZ26RIDi8xw6NOMxzsg54G11A8bPRCROFzhJo4Ld8%3D&amp;reserved=0
>
> A diff from the previous version is available at:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> ietf.org%2Frfcdiff%3Furl2%3Ddraft-ietf-opsawg-ntf-03&amp;data=02%7C01%
> 7Chaoyu.song%40futurewei.com%7Cab54d86e03644f4a445108d85b60f614%7C0fee
> 8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637359819297440306&amp;sdata=I%
> 2FOnsXEWB6mk1%2BIFm0E7Oldo0Xz9sUtoRVws1z5kBX0%3D&amp;reserved=0
>
>
> Please note that it may take a couple of minutes from the time of 
> submission until the htmlized version and diff are available at 
> tools.ietf.org.
>
> Internet-Drafts are also available by anonymous FTP at:
> https://nam11.safelinks.protection.outlook.com/?url=ftp%3A%2F%2Fftp.ie
> tf.org%2Finternet-drafts%2F&amp;data=02%7C01%7Chaoyu.song%40futurewei.
> com%7Cab54d86e03644f4a445108d85b60f614%7C0fee8ff2a3b240189c753a1d5591f
> edc%7C1%7C1%7C637359819297440306&amp;sdata=vFqzUe5UYzkZW6yT4JGIgdp5jM7
> Pt52U9eM3qa5eVBQ%3D&amp;reserved=0
>
>
> _______________________________________________
> OPSAWG mailing list
> [email protected]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> ietf.org%2Fmailman%2Flistinfo%2Fopsawg&amp;data=02%7C01%7Chaoyu.song%4
> 0futurewei.com%7Cab54d86e03644f4a445108d85b60f614%7C0fee8ff2a3b240189c
> 753a1d5591fedc%7C1%7C1%7C637359819297450300&amp;sdata=xtgjSArOpx7s3fd1
> dbWHe%2BZlltC%2BPopkSKpYMRZXJWU%3D&amp;reserved=0

_______________________________________________
OPSAWG mailing list
[email protected]
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fopsawg&amp;data=02%7C01%7Chaoyu.song%40futurewei.com%7Cab54d86e03644f4a445108d85b60f614%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C1%7C637359819297450300&amp;sdata=xtgjSArOpx7s3fd1dbWHe%2BZlltC%2BPopkSKpYMRZXJWU%3D&amp;reserved=0
_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Reply via email to