I'm filing this case for myself. The timer expires Wednesday, April 23rd.
Patch binding is requested. Consolidation Private is requested for all
proposed interfaces except for the new pfiles output, which is unstable
(as per pfiles(1M)).
Overview
========
This cases proposes a new STREAMS ioctl mechanism called _I_CMD which
allows proc tools to reliably retrieve information from modules and
drivers on a stream associated with a stopped process. This case also
proposes an enhancement to pfiles(1M), which will make use of _I_CMD to
report TLI endpoint information, addressing an observability limitation
for customers using zones.
Motivation
==========
For historical reasons, a number of critical Solaris daemons still use
TLI instead of sockets (e.g., 8 of the 55 processes running on my Nevada
desktop after boot have TLI endpoints). While pfiles provides local and
remote address information for socket endpoints, an architectural issue
described below prevents this information from being provided for TLI
endpoints.
Customers have traditionally worked around this limitation by using the
freely available `lsof' tool. However, since lsof uses /dev/kmem (among
other unsavory interfaces) to retrieve this and other information, it is
unsuitable for use in a zone. As such, TLI endpoint information is
currently unavailable inside a zone, which has proven to be a gating
issue for some zones deployments.
Problem
=======
The architectural issue stems from a conflict between the design of
STREAMS ioctls and the way the proc tools (using the proc filesystem)
examine a process. Specifically, proc tools work by asking the kernel
to stop the process they wish to examine (the controlled process), and
then create an "agent LWP" (see PCAGENT in proc(4)) which is used by the
proc tool to perform operations on its behalf but in the context of the
controlled process. Once the information has been retrieved, the
process is resumed, unaware of the examination.
At a high level, providing the information seems straightforward: since
TLI endpoints are STREAMS devices, and since there are already ioctls
available (TI_GETMYNAME and TI_GETPEERNAME) to retrieve TLI endpoint
information, one might think the proc tools could issue the appropriate
ioctl()s on the TLI streams inside the controlled process to get the
information. However, there are several problematic cases to consider:
* If an ioctl request is already outstanding on the stream,
attempting to issue another ioctl() will block indefinitely.
* If the stream is flow controlled, the M_IOCTL message (which is
not a high priority message) will end up enqueued indefinitely.
* Most STREAMS ioctls (including TI_GET{MY,PEER}NAME) are
"transparent" ioctls, which function by having a given STREAMS
module/driver iteratively request that the stream head copy data
in/out of the userland process on its behalf. The copy in/out
sequences are ioctl-specific (and not known to the STREAMS
framework itself). In contrast, the current agent LWP ioctl
design allows a proc tool to issue an ioctl() in the context of
a controlled process by copying in a single buffer at ioctl
entry and copying out the same buffer at ioctl() exit. These
approaches are incompatible.
(Note: sockets don't have these problems because socket endpoint
information is stored in the "sonode" tied to each socket's vnode, and
that information can be accessed without issuing STREAMS ioctls. That
approach isn't feasible for TLI since a TLI device is just like any
other STREAMS device and thus has only a specfs vnode.)
Solution
========
To address the above problems, we propose a new STREAMS ioctl mechanism
called "_I_CMD". The _I_CMD mechanism is similar in spirit to the
existing I_STR (non-transparent) STREAMS ioctl mechanism: the caller
fills in a `struct strcmd' (as they would a `struct strioctl') with an
ioctl command number, timeout, data buffer, and data buffer length:
typedef struct strcmd {
int sc_cmd; /* ioctl command */
int sc_timeout; /* timeout value (in secs) */
int sc_len; /* data length */
int sc_pad;
char sc_buf[STRCMDBUFSIZE]; /* data buffer */
} strcmd_t;
However, unlike `struct strioctl', the data buffer is directly embedded
into the `struct strcmd' to eliminate the need for additional copyin and
copyout operations by the stream head. Thus, this design is compatible
with the existing agent LWP ioctl approach.
When an _I_CMD is issued on a stream, the stream head will sanity-check
the request/payload, and allocate/initialize a new STREAMS message type
called M_CMD. An M_CMD message is similar to an M_IOCTL message, but is
high-priority (thus addressing the flow control issue above) and handled
independently from ioctl messages (thus addressing the "outstanding
ioctl request" problem above). The M_CMD message will be associated
with a `struct cmdblk', which is similar to the existing `struct iocblk'
message used for M_IOCTL messages, but tailored for M_CMD:
typedef struct cmdblk {
int cb_cmd; /* ioctl command type */
cred_t *cb_cr; /* full credentials */
uint_t cb_len; /* payload size */
int cb_error; /* error code */
} cmdblk_t;
As with an I_STR M_IOCTL message, the stream head will also allocate an
M_DATA message (of sc_len bytes) and place the contents of sc_buf into
the message. As with M_IOCTL, the M_DATA will be chained onto the M_CMD
message and sent downstream. The stream head will then wait
interruptibly for sc_timeout seconds (or forever if sc_timeout is -1) to
receive an M_CMD response. Upon receiving a response, the data will be
copied back out to userland and the _I_CMD will complete.
Since this facility is intended for use by the proc tools and only one
proc tool can examine a process at a time, attempting to issue an _I_CMD
while one is already pending will fail with EBUSY. However, this
interface restriction could be lifted in the future at the expense of
additional implementation complexity.
By design, from the standpoint of a STREAMS module or driver, handling
an M_CMD message is quite similar to handling a non-transparent M_IOCTL
or a transparent M_IOCDATA. Thus, it is straightforward to share
processing routines which can be used by either the traditional
M_IOCTL/M_IOCDATA facility or the new M_CMD facility. That said, only
STREAMS ioctls which need to be used by the proc tools need to have
M_CMD support (presently just TI_GET{MY,PEER}NAME).
As per the STREAMS design, modules putnext() messages they do not
recognize, and drivers freemsg() messages they do not recognize. Thus,
the risk associated with adding the new M_CMD STREAMS message is
minimal. However, because drivers will freemsg() M_CMD messages by
default (causing the _I_CMD to timeout), pfiles will take care to only
issue the _I_CMD on known IP-based TLI devices (currently: /dev/udp,
/dev/udp6, /dev/tcp, and /dev/tcp6). The proposed output of pfiles on
TLI endpoints matches the current output used for sockets -- e.g.:
# pfiles `pgrep rpc`
100395: /usr/sbin/rpcbind
[ ... ]
19: S_IFCHR mode:0000 dev:333,0 ino:44170 uid:0 gid:0 rdev:105,27
O_RDWR
/devices/pseudo/tl at 0:ticots
20: S_IFCHR mode:0000 dev:333,0 ino:65028 uid:0 gid:0 rdev:42,35
O_RDWR|O_NONBLOCK FD_CLOEXEC
--> sockname: AF_INET 10.8.57.32 port: 111
--> peername: AF_INET 10.8.57.11 port: 61404
/devices/pseudo/tcp at 0:tcp
--
meem