Good points. I've updated the spec. It will take a bit of time to propagate, so 
I've appended the current .md text below.

-----Original Message-----
From: Guy Harris <ghar...@sonic.net> 
Sent: Saturday, January 5, 2019 11:39 PM
To: Dave Barach (dbarach) <dbar...@cisco.com>
Cc: tcpdump-workers <tcpdump-workers@lists.tcpdump.org>
Subject: Re: [tcpdump-workers] Request for a new LINKTYPE_/DLT_ type.

On Dec 29, 2018, at 4:50 AM, Dave Barach (dbarach) <dbar...@cisco.com> wrote:

> The same packet - with [traced] metadata changes - will appear multiple times 
> as the packet traverses the vpp forwarding graph.

The description of the format should probably warn about that, because protocol 
analyzers that maintain state between packets might get confused if multiple 
instances of the same packet appear in a capture.

> Simple example: from the driver layer, an ip4 transit packet will visit 
> ethernet-input, ip4-input[-no-checksum], ip4-lookup, ip4-rewrite, 
> interface-output, and the device driver TX node. Each of those visits results 
> in a trace record. The dispatch framework traces vectors of packets, so one 
> sees N x trace records from ethernet-input, the N x trace records from 
> ip4-input, and so on. Folks typically filter by buffer-index in wireshark, to 
> see what happens to one packet in a convenient sequential view.

So an analyzer *could*, in theory, work around this by, for example, treating 
each node name(?) as a separate flow, with a copy of a packet that visited one 
node as not being related to packets that visited different nodes, so a 
dissector would treat all of the copies of the IPv4 transit packet listed above 
as separate packets rather than as, for example, retransmissions of the same 
packet, and so that a request at one layer isn't matched with all of the copies 
of a reply that show up.

>>> Limiting stateful analysis to one graph node - "ethernet-input" - ought to 
>>> "just work..." 

I suppose that you could also suppress all dissection past the IP or maybe 
transport layer, although if you see multiple instances of a TCP segment, the 
TCP dissector will interpret that as a retransmission unless it knows that 
they're just multiple appearances of the same packet.

The problem here is that a VPP trace is significantly different from a regular 
network capture, in that it seems mainly tracing the flow of a packet through 
the packet processing code on a single machine rather than tracing its flow on 
a network; packet analyzers are more oriented towards the latter.

You don't need to give details of *how* an analyzer should deal with this - 
different analyzers might choose to do so in different ways; just note that 
this is significantly different from the sort of network traces one might be 
used to.

------------------------

Graph Dispatcher Pcap Tracing
-----------------------------

The vpp graph dispatcher knows how to capture vectors of packets in pcap
format as they're dispatched. The pcap captures are as follows:

```
    VPP graph dispatch trace record description:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Major Version | Minor Version | NStrings      | ProtoHint     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Buffer index (big endian)                                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       + VPP graph node name ...     ...               | NULL octet    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Buffer Metadata ... ...                       | NULL octet    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Buffer Opaque ... ...                         | NULL octet    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Buffer Opaque 2 ... ...                       | NULL octet    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | VPP ASCII packet trace (if NStrings > 4)      | NULL octet    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Packet data (up to 16K)                                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```

Graph dispatch records comprise a version stamp, an indication of how
many NULL-terminated strings will follow the record header and preceed
packet data, and a protocol hint.

The buffer index is an opaque 32-bit cookie which allows consumers of
these data to easily filter/track single packets as they traverse the
forwarding graph.

Multiple records per packet are normal, and to be expected. Packets
will appear multipe times as they traverse the vpp forwarding
graph. In this way, vpp graph dispatch traces are significantly
different from regular network packet captures from an end-station.
This property complicates stateful packet analysis.

Restricting stateful analysis to records from a single vpp graph node
such as "ethernet-input" seems likely to improve the situation.

As of this writing: major version = 1, minor version = 0. Nstrings
SHOULD be 4 or 5. Consumers SHOULD be wary values less than 4 or
greater than 5. They MAY attempt to display the claimed number of
strings, or they MAY treat the condition as an error.

Here is the current set of protocol hints:

```c
    typedef enum
      {
        VLIB_NODE_PROTO_HINT_NONE = 0,
        VLIB_NODE_PROTO_HINT_ETHERNET,
        VLIB_NODE_PROTO_HINT_IP4,
        VLIB_NODE_PROTO_HINT_IP6,
        VLIB_NODE_PROTO_HINT_TCP,
        VLIB_NODE_PROTO_HINT_UDP,
        VLIB_NODE_N_PROTO_HINTS,
      } vlib_node_proto_hint_t;
```

Example: VLIB_NODE_PROTO_HINT_IP6 means that the first octet of packet
data SHOULD be 0x60, and should begin an ipv6 packet header.

Downstream consumers of these data SHOULD pay attention to the
protocol hint. They MUST tolerate inaccurate hints, which MAY occur
from time to time.
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

Reply via email to