As per the subject line, the updated spec is attached for
the above fasttrack. In the interim, the project team have
worked offline with Kacheong and Darren to address their
concerns. We have also incorporated SACK, PMTU and
congestion window info into the tcpsinfo_t at Nico's
suggestion. Thanks!

Alan
Template Version: @(#)sac_nextcase 1.66 04/17/08 SMI
This information is Copyright 2010 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
         DTrace TCP/UDP Providers
    1.2. Name of Document Author/Supplier:
         Author:  Brendan Gregg/Alan Maguire
    1.3  Date of This Document:
        1 May, 2010
4. Technical Description

A. INTRODUCTION

This case adds DTrace 'tcp' and 'udp' providers with probes 
for send and receive events.  These providers cover the TCP
and UDP protocol implementations in OpenSolaris respectively.  In 
addition the tcp provider contains probes for TCP state machine 
transitions and events relating to TCP connection establishment.
We initially focus on connection establishment events in TCP since 
there are solid use cases for those probes (connection latency,
first-byte latency, port scan detection etc).  In the future, if 
there is similar demand for connection termination probes we will
provide these also.

These providers are intended for use by customers for network 
observability and troubleshooting, and this work represents the 
second and third components of a suite of planned providers for 
the network stack.  The first was described in PSARC/2008/302 DTrace 
IP Provider.

The provider design described here is intended to set an
architectural precendent for similar future providers (e.g. SCTP,
ICMP, RDP) in terms of probe names:

- :::send, :::receive, and :::state-change (the latter for stateful protocols)
  should be used where possible for transmit, receive and state transition
  events

...and probe arguments:

- For consistency, future networking providers should attempt to adhere to a
  similar argument order:

  arg0: pktinfo_t *
  arg1: csinfo_t *
  arg2: ipinfo_t *
  arg3: protocol state-specific sinfo_t (e.g. tcpsinfo_t *, sctpsinfo_t *)
  arg4: protocol-specific header info_t (e.g. tcpinfo_t *, sctpinfo_t *)

B. TCP PROBE DESCRIPTION

This work will introduce the following probes for the 'tcp' provider:

tcp:::send

        TCP transmits a segment.

tcp:::receive

        TCP receives a segment.

tcp:::state-change

        TCP state machine transition.  Previous state is noted in the
        tcplsinfo_t * probe argument.  The tcpinfo_t * and ipinfo_t *
        arguments are NULL.

tcp:::connect-request

        A TCP active open is initiated by sending an initial SYN segment.
        The tcpinfo_t * and ipinfo_t * probe arguments represent the 
        TCP and IP headers associated with the initial SYN segment sent.

tcp:::connect-established

        This probe fires when either of the following occurs:

        - A TCP active OPEN succeeds - the initial SYN has been sent and
          a valid SYN|ACK segment has been received in response. TCP enters
          the ESTABLISHED state. The tcpinfo_t * and ipinfo_t * probe arguments 
          represent the TCP and IP headers associated with the SYN|ACK segment
          received; or

        - A simultaneous active OPEN succeeds and a final ACK is
          received from the peer TCP. TCP has entered the ESTABLISHED state
          and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP
          and IP headers of the final ACK received.

        The common thread in these cases is that an active-OPEN connection is
        established at this point, in contrast with tcp:::accept-established
        which fires on passive connection establishment. In both cases above, 
        the TCP segment that is presented via the tcpinfo_t * is the segment
        that triggers the transition to ESTABLISHED - the received SYN|ACK 
        in the first case and the final ACK segment in the second.

tcp:::connect-refused

        A TCP active OPEN connection attempt was refused by the peer -
        a RST segment was received in acknowledgment of the initial SYN.
        The tcpinfo_t * and ipinfo_t * probe arguments represent the 
        TCP and IP headers associated with the RST|ACK segment received.

tcp:::accept-established

        A passive open has succeeded - an initial active OPEN initiation SYN 
has 
        been received, TCP responded with a SYN|ACK and a final ACK has been 
        received.  TCP has entered the ESTABLISHED state.  The tcpinfo_t * and 
        ipinfo_t * probe arguments represent the TCP and IP headers associated
        with the final ACK segment received.

tcp:::accept-refused

        An incoming SYN has arrived for a destination port with no
        listening connection, so the connection initiation request is rejected
        by sending a RST segment ACKing the SYN.  The tcpinfo_t * and 
        ipinfo_t * probe arguments represent the TCP and IP headers associated
        with the RST segment sent.

This case postpones addressing connection termination at this point since 
use cases for such probes have not yet been determined.  However, the addition
of connection termination probes will be evaluated in the future if use cases 
demonstrate that such probes would be valuable (the above probes are useful for
scenarios such as determining connection latency, first-byte latency, detecting
port scans etc).

The project team explored using TCP state transitions coupled with predicates
instead of introducing the above connection-related probes, however in 
prototyping out a solution these proved complex and introduced performance
problems due to having to probe for a wide set of events and then evaluate
a predicate for the narrower set of events of interest - an enabled-probe
effect in other words.

C. TCP PROBE ARGUMENTS

The arguments to these probes are:

        args[0]         pktinfo_t *     packet info
        args[1]         csinfo_t *      connection state info
        args[2]         ipinfo_t *      generic IP info
        args[3]         tcpsinfo_t *    TCP state information
        args[4]         tcpinfo_t *     TCP header
        args[5]         tcplsinfo_t *   Previous TCP state

The order and content has been chosen for consistency with other
network providers.

The ipinfo_t * and tcpinfo_t * will be NULL for tcp:::state-change events,
and the tcplsinfo_t * argument is present for the tcp:::state-change probe
only.

The arguments contain:

/*
 * pktinfo is where packet ID info can be made available for deeper
 * analysis if packet IDs become supported by the kernel in the future.
 * The pkt_addr member is currently always NULL.
 */
typedef struct pktinfo {
        uintptr_t pkt_addr;
} pktinfo_t;

/*
 * csinfo is where connection state info is made available.
 */
typedef struct csinfo {
        uintptr_t cs_addr;
        uint64_t cs_cid;
        pid_t cs_pid;
        zoneid_t cs_zoneid;
 } csinfo_t;

Note that we have filled out the csinfo_t with additional information.
In PSARC/2008/302 DTrace IP Provider, only the cs_addr was present.
Here we take advantage of the fact that the IP datapath refactoring
project modified IP such that a transmit attribute structure -
ip_xmit_attr_t * - travels with the packet on the outbound path;
and on the inbound path, the conn_t contains an ip_xmit_attr_t *,
so that when a packet is classified by IP and passed up to UDP,
SCTP or TCP, we have access to information about the target process,
zone etc.  We will also add a new connection id field to the ip_xmit_attr_t. 
Having such an identifier is extremely useful in DTrace.  Future work will 
ensure this extended csinfo_t is used at the ip:::send probe points also, 
though significant code refactoring will be required to ensure availability 
of the  ip_xmit_attr_t * at all these points.  On the ip:::receive side, it
is not possible to make this mapping as the IP packet has not yet been
classified (and mapped to a conn_t) when ip:::receive fires.

/*
 * ipinfo contains common IP info for both IPv4 and IPv6.
 */
typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;


/*
 * tcpsinfo contains stable TCP details from tcp_t.
 */
typedef struct tcpsinfo {
        uintptr tcps_addr;
        int tcps_local;                 /* is delivered locally, boolean */
        int tcps_active;                /* active open (from here), boolean */
        uint16_t tcps_lport;            /* local port */
        uint16_t tcps_rport;            /* remote port */
        string tcps_laddr;              /* local address, as a string */
        string tcps_raddr;              /* remote address, as a string */
        int32_t tcps_state;             /* TCP state */
        uint32_t tcps_iss;              /* initial sequence # */
        uint32_t tcps_suna;             /* sequence # sent but unacked */
        uint32_t tcps_snxt;             /* next sequence # to send */
        uint32_t tcps_rack;             /* sequence # we have acked */
        uint32_t tcps_rnxt;             /* next sequence # expected */
        uint32_t tcps_swnd;             /* send window size */
        uint32_t tcps_snd_ws;           /* send window scaling */
        uint32_t tcps_rwnd;             /* receive window size */
        uint32_t tcps_rcv_ws;           /* receive window scaling */
        uint32_t tcps_cwnd;             /* congestion window */
        uint32_t tcps_cwnd_ssthresh;    /* threshold for congestion avoidance */
        uint32_t tcps_sack_fack;        /* SACK sequence # we have acked */
        uint32_t tcps_sack_snxt;        /* next SACK seq # for retransmission */
        uint32_t tcps_rto;              /* round-trip timeout, msec */
        uint32_t tcps_mss;              /* max segment size */
        int tcps_retransmit;            /* retransmit send event, boolean */
} tcpsinfo_t;

The tcpsinfo_t has been expanded to cover SACK-, congestion window-
and PMTU-related data, as was suggested.

/*
 * tcpinfo is the TCP header fields.
 */
typedef struct tcpinfo {
        uint16_t tcp_sport;             /* source port */
        uint16_t tcp_dport;             /* destination port */
        uint32_t tcp_seq;               /* sequence number */
        uint32_t tcp_ack;               /* acknowledgment number */
        uint8_t tcp_offset;             /* data offset, in bytes */
        uint8_t tcp_flags;              /* flags */
        uint16_t tcp_window;            /* window size */
        uint16_t tcp_checksum;          /* checksum */
        uint16_t tcp_urgent;            /* urgent data pointer */
        tcph_t *tcp_hdr;                /* raw TCP header */
} tcpinfo_t;

/*
 * tcpnsinfo provides the new tcp state for state changes.
 */
typedef struct tcplsinfo {
        int32_t tcps_state;              /* TCP state */
} tcplsinfo_t;

D. UDP PROBE DESCRIPTION

This work will also introduce the following probes for the 'udp' provider:

        udp:::send

        UDP sends a datagram.

        udp:::receive

        UDP receives a datagram.

E. UDP PROBE ARGUMENTS

The arguments to these probes are:

        args[0]         pktinfo_t *             packet info
        args[1]         csinfo_t *              connection state info
        args[2]         ipinfo_t *              generic IP info
        args[3]         udpsinfo_t *            UDP state information
        args[4]         udpinfo_t *             UDP header

/*
 * udpsinfo contains stable UDP details from udp_t.
 */
typedef struct udpsinfo {
        uintptr_t       udps_addr;
        uint16_t        upds_lport;     /* local port */
        uint16_t        udps_rport;     /* remote port */
        string          udps_laddr;     /* local address, as a string */
        string          udps_raddr;     /* remote address, as a  string */
} udpsinfo_t;

/*
 * udpinfo is the UDP header fields.
 */
typedef struct udpinfo {
        uint16_t udp_sport;             /* source port */
        uint16_t udp_dport;             /* destination port */
        uint16_t udp_length;            /* total length */
        uint16_t udp_checksum;          /* headers + data checksum */
        udpha_t *udp_hdr;               /* raw UDP header */
} udpinfo_t;

F. EXAMPLES

# Watch inbound TCP connections by remote address.
dtrace -n 'tcp:::accept-established { trace(args[2]->ip_saddr); }'

# Inbound TCP connections by destination port summary.
dtrace -n 'tcp:::accept-established { @port[args[4]->tcp_dport] = count(); }'

# Watch inbound accepted TCP connections by process summary.
dtrace -n 'tcp:::accept-established { @cpid[args[1]->cs_pid] = count(); }'

# Watch UDP total number of bytes sent/received by process.
dtrace -n 'udp:::send,udp:::receive { @bytes[args[1]->cs_pid] = 
sum(args[4]->udp_length);}'

G. REFERENCES

The suite of planned providers is described on the following website,
which includes demonstrations and source from previous prototypes:

http://www.opensolaris.org/os/community/dtrace/NetworkProvider

These providers have also been discussed in the past on both
dtrace-discuss and networking-discuss:

http://www.opensolaris.org/jive/thread.jspa?messageID=57666&#57666
http://www.opensolaris.org/jive/thread.jspa?messageID=128518&#128518

RFC793

http://tools.ietf.org/html/rfc793

H. DOCUMENTATION

New chapters will be added to the current Solaris Dynamic Tracing Guide
for these proposed providers, and demo scripts will be added to 
/usr/demo/dtrace.

The tcp provider is described here:

http://wikis.sun.com/display/DTrace/tcp+Provider

...and the udp provider is described here:

http://wikis.sun.com/display/DTrace/udp+Provider

I. STABILITY

The DTrace internal stability table is described below:

Element         Name stability  Data stability  Dependency class
Provider        Evolving        Evolving        ISA
Module          Private         Private         Unknown
Function        Private         Private         Unknown
Name            Evolving        Evolving        ISA
Arguments       Evolving        Evolving        ISA

6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                OS/Net
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open

_______________________________________________
opensolaris-arc mailing list
[email protected]

Reply via email to