As per the subject line, the updated spec is attached for
the above fasttrack. In the interim, the project team have
worked offline with Kacheong and Darren to address their
concerns. We have also incorporated SACK, PMTU and
congestion window info into the tcpsinfo_t at Nico's
suggestion. Thanks!
Alan
Template Version: @(#)sac_nextcase 1.66 04/17/08 SMI
This information is Copyright 2010 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
DTrace TCP/UDP Providers
1.2. Name of Document Author/Supplier:
Author: Brendan Gregg/Alan Maguire
1.3 Date of This Document:
1 May, 2010
4. Technical Description
A. INTRODUCTION
This case adds DTrace 'tcp' and 'udp' providers with probes
for send and receive events. These providers cover the TCP
and UDP protocol implementations in OpenSolaris respectively. In
addition the tcp provider contains probes for TCP state machine
transitions and events relating to TCP connection establishment.
We initially focus on connection establishment events in TCP since
there are solid use cases for those probes (connection latency,
first-byte latency, port scan detection etc). In the future, if
there is similar demand for connection termination probes we will
provide these also.
These providers are intended for use by customers for network
observability and troubleshooting, and this work represents the
second and third components of a suite of planned providers for
the network stack. The first was described in PSARC/2008/302 DTrace
IP Provider.
The provider design described here is intended to set an
architectural precendent for similar future providers (e.g. SCTP,
ICMP, RDP) in terms of probe names:
- :::send, :::receive, and :::state-change (the latter for stateful protocols)
should be used where possible for transmit, receive and state transition
events
...and probe arguments:
- For consistency, future networking providers should attempt to adhere to a
similar argument order:
arg0: pktinfo_t *
arg1: csinfo_t *
arg2: ipinfo_t *
arg3: protocol state-specific sinfo_t (e.g. tcpsinfo_t *, sctpsinfo_t *)
arg4: protocol-specific header info_t (e.g. tcpinfo_t *, sctpinfo_t *)
B. TCP PROBE DESCRIPTION
This work will introduce the following probes for the 'tcp' provider:
tcp:::send
TCP transmits a segment.
tcp:::receive
TCP receives a segment.
tcp:::state-change
TCP state machine transition. Previous state is noted in the
tcplsinfo_t * probe argument. The tcpinfo_t * and ipinfo_t *
arguments are NULL.
tcp:::connect-request
A TCP active open is initiated by sending an initial SYN segment.
The tcpinfo_t * and ipinfo_t * probe arguments represent the
TCP and IP headers associated with the initial SYN segment sent.
tcp:::connect-established
This probe fires when either of the following occurs:
- A TCP active OPEN succeeds - the initial SYN has been sent and
a valid SYN|ACK segment has been received in response. TCP enters
the ESTABLISHED state. The tcpinfo_t * and ipinfo_t * probe arguments
represent the TCP and IP headers associated with the SYN|ACK segment
received; or
- A simultaneous active OPEN succeeds and a final ACK is
received from the peer TCP. TCP has entered the ESTABLISHED state
and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP
and IP headers of the final ACK received.
The common thread in these cases is that an active-OPEN connection is
established at this point, in contrast with tcp:::accept-established
which fires on passive connection establishment. In both cases above,
the TCP segment that is presented via the tcpinfo_t * is the segment
that triggers the transition to ESTABLISHED - the received SYN|ACK
in the first case and the final ACK segment in the second.
tcp:::connect-refused
A TCP active OPEN connection attempt was refused by the peer -
a RST segment was received in acknowledgment of the initial SYN.
The tcpinfo_t * and ipinfo_t * probe arguments represent the
TCP and IP headers associated with the RST|ACK segment received.
tcp:::accept-established
A passive open has succeeded - an initial active OPEN initiation SYN
has
been received, TCP responded with a SYN|ACK and a final ACK has been
received. TCP has entered the ESTABLISHED state. The tcpinfo_t * and
ipinfo_t * probe arguments represent the TCP and IP headers associated
with the final ACK segment received.
tcp:::accept-refused
An incoming SYN has arrived for a destination port with no
listening connection, so the connection initiation request is rejected
by sending a RST segment ACKing the SYN. The tcpinfo_t * and
ipinfo_t * probe arguments represent the TCP and IP headers associated
with the RST segment sent.
This case postpones addressing connection termination at this point since
use cases for such probes have not yet been determined. However, the addition
of connection termination probes will be evaluated in the future if use cases
demonstrate that such probes would be valuable (the above probes are useful for
scenarios such as determining connection latency, first-byte latency, detecting
port scans etc).
The project team explored using TCP state transitions coupled with predicates
instead of introducing the above connection-related probes, however in
prototyping out a solution these proved complex and introduced performance
problems due to having to probe for a wide set of events and then evaluate
a predicate for the narrower set of events of interest - an enabled-probe
effect in other words.
C. TCP PROBE ARGUMENTS
The arguments to these probes are:
args[0] pktinfo_t * packet info
args[1] csinfo_t * connection state info
args[2] ipinfo_t * generic IP info
args[3] tcpsinfo_t * TCP state information
args[4] tcpinfo_t * TCP header
args[5] tcplsinfo_t * Previous TCP state
The order and content has been chosen for consistency with other
network providers.
The ipinfo_t * and tcpinfo_t * will be NULL for tcp:::state-change events,
and the tcplsinfo_t * argument is present for the tcp:::state-change probe
only.
The arguments contain:
/*
* pktinfo is where packet ID info can be made available for deeper
* analysis if packet IDs become supported by the kernel in the future.
* The pkt_addr member is currently always NULL.
*/
typedef struct pktinfo {
uintptr_t pkt_addr;
} pktinfo_t;
/*
* csinfo is where connection state info is made available.
*/
typedef struct csinfo {
uintptr_t cs_addr;
uint64_t cs_cid;
pid_t cs_pid;
zoneid_t cs_zoneid;
} csinfo_t;
Note that we have filled out the csinfo_t with additional information.
In PSARC/2008/302 DTrace IP Provider, only the cs_addr was present.
Here we take advantage of the fact that the IP datapath refactoring
project modified IP such that a transmit attribute structure -
ip_xmit_attr_t * - travels with the packet on the outbound path;
and on the inbound path, the conn_t contains an ip_xmit_attr_t *,
so that when a packet is classified by IP and passed up to UDP,
SCTP or TCP, we have access to information about the target process,
zone etc. We will also add a new connection id field to the ip_xmit_attr_t.
Having such an identifier is extremely useful in DTrace. Future work will
ensure this extended csinfo_t is used at the ip:::send probe points also,
though significant code refactoring will be required to ensure availability
of the ip_xmit_attr_t * at all these points. On the ip:::receive side, it
is not possible to make this mapping as the IP packet has not yet been
classified (and mapped to a conn_t) when ip:::receive fires.
/*
* ipinfo contains common IP info for both IPv4 and IPv6.
*/
typedef struct ipinfo {
uint8_t ip_ver; /* IP version (4, 6) */
uint16_t ip_plength; /* payload length */
string ip_saddr; /* source address */
string ip_daddr; /* destination address */
} ipinfo_t;
/*
* tcpsinfo contains stable TCP details from tcp_t.
*/
typedef struct tcpsinfo {
uintptr tcps_addr;
int tcps_local; /* is delivered locally, boolean */
int tcps_active; /* active open (from here), boolean */
uint16_t tcps_lport; /* local port */
uint16_t tcps_rport; /* remote port */
string tcps_laddr; /* local address, as a string */
string tcps_raddr; /* remote address, as a string */
int32_t tcps_state; /* TCP state */
uint32_t tcps_iss; /* initial sequence # */
uint32_t tcps_suna; /* sequence # sent but unacked */
uint32_t tcps_snxt; /* next sequence # to send */
uint32_t tcps_rack; /* sequence # we have acked */
uint32_t tcps_rnxt; /* next sequence # expected */
uint32_t tcps_swnd; /* send window size */
uint32_t tcps_snd_ws; /* send window scaling */
uint32_t tcps_rwnd; /* receive window size */
uint32_t tcps_rcv_ws; /* receive window scaling */
uint32_t tcps_cwnd; /* congestion window */
uint32_t tcps_cwnd_ssthresh; /* threshold for congestion avoidance */
uint32_t tcps_sack_fack; /* SACK sequence # we have acked */
uint32_t tcps_sack_snxt; /* next SACK seq # for retransmission */
uint32_t tcps_rto; /* round-trip timeout, msec */
uint32_t tcps_mss; /* max segment size */
int tcps_retransmit; /* retransmit send event, boolean */
} tcpsinfo_t;
The tcpsinfo_t has been expanded to cover SACK-, congestion window-
and PMTU-related data, as was suggested.
/*
* tcpinfo is the TCP header fields.
*/
typedef struct tcpinfo {
uint16_t tcp_sport; /* source port */
uint16_t tcp_dport; /* destination port */
uint32_t tcp_seq; /* sequence number */
uint32_t tcp_ack; /* acknowledgment number */
uint8_t tcp_offset; /* data offset, in bytes */
uint8_t tcp_flags; /* flags */
uint16_t tcp_window; /* window size */
uint16_t tcp_checksum; /* checksum */
uint16_t tcp_urgent; /* urgent data pointer */
tcph_t *tcp_hdr; /* raw TCP header */
} tcpinfo_t;
/*
* tcpnsinfo provides the new tcp state for state changes.
*/
typedef struct tcplsinfo {
int32_t tcps_state; /* TCP state */
} tcplsinfo_t;
D. UDP PROBE DESCRIPTION
This work will also introduce the following probes for the 'udp' provider:
udp:::send
UDP sends a datagram.
udp:::receive
UDP receives a datagram.
E. UDP PROBE ARGUMENTS
The arguments to these probes are:
args[0] pktinfo_t * packet info
args[1] csinfo_t * connection state info
args[2] ipinfo_t * generic IP info
args[3] udpsinfo_t * UDP state information
args[4] udpinfo_t * UDP header
/*
* udpsinfo contains stable UDP details from udp_t.
*/
typedef struct udpsinfo {
uintptr_t udps_addr;
uint16_t upds_lport; /* local port */
uint16_t udps_rport; /* remote port */
string udps_laddr; /* local address, as a string */
string udps_raddr; /* remote address, as a string */
} udpsinfo_t;
/*
* udpinfo is the UDP header fields.
*/
typedef struct udpinfo {
uint16_t udp_sport; /* source port */
uint16_t udp_dport; /* destination port */
uint16_t udp_length; /* total length */
uint16_t udp_checksum; /* headers + data checksum */
udpha_t *udp_hdr; /* raw UDP header */
} udpinfo_t;
F. EXAMPLES
# Watch inbound TCP connections by remote address.
dtrace -n 'tcp:::accept-established { trace(args[2]->ip_saddr); }'
# Inbound TCP connections by destination port summary.
dtrace -n 'tcp:::accept-established { @port[args[4]->tcp_dport] = count(); }'
# Watch inbound accepted TCP connections by process summary.
dtrace -n 'tcp:::accept-established { @cpid[args[1]->cs_pid] = count(); }'
# Watch UDP total number of bytes sent/received by process.
dtrace -n 'udp:::send,udp:::receive { @bytes[args[1]->cs_pid] =
sum(args[4]->udp_length);}'
G. REFERENCES
The suite of planned providers is described on the following website,
which includes demonstrations and source from previous prototypes:
http://www.opensolaris.org/os/community/dtrace/NetworkProvider
These providers have also been discussed in the past on both
dtrace-discuss and networking-discuss:
http://www.opensolaris.org/jive/thread.jspa?messageID=57666
http://www.opensolaris.org/jive/thread.jspa?messageID=128518😆
RFC793
http://tools.ietf.org/html/rfc793
H. DOCUMENTATION
New chapters will be added to the current Solaris Dynamic Tracing Guide
for these proposed providers, and demo scripts will be added to
/usr/demo/dtrace.
The tcp provider is described here:
http://wikis.sun.com/display/DTrace/tcp+Provider
...and the udp provider is described here:
http://wikis.sun.com/display/DTrace/udp+Provider
I. STABILITY
The DTrace internal stability table is described below:
Element Name stability Data stability Dependency class
Provider Evolving Evolving ISA
Module Private Private Unknown
Function Private Private Unknown
Name Evolving Evolving ISA
Arguments Evolving Evolving ISA
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
OS/Net
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open
_______________________________________________
opensolaris-arc mailing list
[email protected]