2008]

Peter Memishian Thu, 17 Apr 2008 00:53:50 -0400

I'm filing this case for myself.  The timer expires Wednesday, April 23rd.


Patch binding is requested.  Consolidation Private is requested for all
proposed interfaces except for the new pfiles output, which is unstable
(as per pfiles(1M)).

Overview
========

  This cases proposes a new STREAMS ioctl mechanism called _I_CMD which
  allows proc tools to reliably retrieve information from modules and
  drivers on a stream associated with a stopped process.  This case also
  proposes an enhancement to pfiles(1M), which will make use of _I_CMD to
  report TLI endpoint information, addressing an observability limitation
  for customers using zones.

Motivation
==========

  For historical reasons, a number of critical Solaris daemons still use
  TLI instead of sockets (e.g., 8 of the 55 processes running on my Nevada
  desktop after boot have TLI endpoints).  While pfiles provides local and
  remote address information for socket endpoints, an architectural issue
  described below prevents this information from being provided for TLI
  endpoints.

  Customers have traditionally worked around this limitation by using the
  freely available `lsof' tool.  However, since lsof uses /dev/kmem (among
  other unsavory interfaces) to retrieve this and other information, it is
  unsuitable for use in a zone.  As such, TLI endpoint information is
  currently unavailable inside a zone, which has proven to be a gating
  issue for some zones deployments.

Problem
=======

  The architectural issue stems from a conflict between the design of
  STREAMS ioctls and the way the proc tools (using the proc filesystem)
  examine a process.  Specifically, proc tools work by asking the kernel
  to stop the process they wish to examine (the controlled process), and
  then create an "agent LWP" (see PCAGENT in proc(4)) which is used by the
  proc tool to perform operations on its behalf but in the context of the
  controlled process.  Once the information has been retrieved, the
  process is resumed, unaware of the examination.

  At a high level, providing the information seems straightforward: since
  TLI endpoints are STREAMS devices, and since there are already ioctls
  available (TI_GETMYNAME and TI_GETPEERNAME) to retrieve TLI endpoint
  information, one might think the proc tools could issue the appropriate
  ioctl()s on the TLI streams inside the controlled process to get the
  information.  However, there are several problematic cases to consider:

        * If an ioctl request is already outstanding on the stream,
          attempting to issue another ioctl() will block indefinitely.

        * If the stream is flow controlled, the M_IOCTL message (which is
          not a high priority message) will end up enqueued indefinitely.

        * Most STREAMS ioctls (including TI_GET{MY,PEER}NAME) are
          "transparent" ioctls, which function by having a given STREAMS
          module/driver iteratively request that the stream head copy data
          in/out of the userland process on its behalf.  The copy in/out
          sequences are ioctl-specific (and not known to the STREAMS
          framework itself).  In contrast, the current agent LWP ioctl
          design allows a proc tool to issue an ioctl() in the context of
          a controlled process by copying in a single buffer at ioctl
          entry and copying out the same buffer at ioctl() exit.  These
          approaches are incompatible.
  
  (Note: sockets don't have these problems because socket endpoint
  information is stored in the "sonode" tied to each socket's vnode, and
  that information can be accessed without issuing STREAMS ioctls.  That
  approach isn't feasible for TLI since a TLI device is just like any
  other STREAMS device and thus has only a specfs vnode.)

Solution
========

  To address the above problems, we propose a new STREAMS ioctl mechanism
  called "_I_CMD".  The _I_CMD mechanism is similar in spirit to the
  existing I_STR (non-transparent) STREAMS ioctl mechanism: the caller
  fills in a `struct strcmd' (as they would a `struct strioctl') with an
  ioctl command number, timeout, data buffer, and data buffer length:

        typedef struct strcmd { 
                int     sc_cmd;                 /* ioctl command */ 
                int     sc_timeout;             /* timeout value (in secs) */
                int     sc_len;                 /* data length */ 
                int     sc_pad; 
                char    sc_buf[STRCMDBUFSIZE];  /* data buffer */ 
        } strcmd_t; 

  However, unlike `struct strioctl', the data buffer is directly embedded
  into the `struct strcmd' to eliminate the need for additional copyin and
  copyout operations by the stream head.  Thus, this design is compatible
  with the existing agent LWP ioctl approach.

  When an _I_CMD is issued on a stream, the stream head will sanity-check
  the request/payload, and allocate/initialize a new STREAMS message type
  called M_CMD.  An M_CMD message is similar to an M_IOCTL message, but is
  high-priority (thus addressing the flow control issue above) and handled
  independently from ioctl messages (thus addressing the "outstanding
  ioctl request" problem above).  The M_CMD message will be associated
  with a `struct cmdblk', which is similar to the existing `struct iocblk'
  message used for M_IOCTL messages, but tailored for M_CMD:

        typedef struct cmdblk {
                int      cb_cmd;                /* ioctl command type */
                cred_t   *cb_cr;                /* full credentials */
                uint_t   cb_len;                /* payload size */
                int      cb_error;              /* error code */
        } cmdblk_t;

  As with an I_STR M_IOCTL message, the stream head will also allocate an
  M_DATA message (of sc_len bytes) and place the contents of sc_buf into
  the message.  As with M_IOCTL, the M_DATA will be chained onto the M_CMD
  message and sent downstream.  The stream head will then wait
  interruptibly for sc_timeout seconds (or forever if sc_timeout is -1) to
  receive an M_CMD response.  Upon receiving a response, the data will be
  copied back out to userland and the _I_CMD will complete.

  Since this facility is intended for use by the proc tools and only one
  proc tool can examine a process at a time, attempting to issue an _I_CMD
  while one is already pending will fail with EBUSY.  However, this
  interface restriction could be lifted in the future at the expense of
  additional implementation complexity.

  By design, from the standpoint of a STREAMS module or driver, handling
  an M_CMD message is quite similar to handling a non-transparent M_IOCTL
  or a transparent M_IOCDATA.  Thus, it is straightforward to share
  processing routines which can be used by either the traditional
  M_IOCTL/M_IOCDATA facility or the new M_CMD facility.  That said, only
  STREAMS ioctls which need to be used by the proc tools need to have
  M_CMD support (presently just TI_GET{MY,PEER}NAME).

  As per the STREAMS design, modules putnext() messages they do not
  recognize, and drivers freemsg() messages they do not recognize.  Thus,
  the risk associated with adding the new M_CMD STREAMS message is
  minimal.  However, because drivers will freemsg() M_CMD messages by
  default (causing the _I_CMD to timeout), pfiles will take care to only
  issue the _I_CMD on known IP-based TLI devices (currently: /dev/udp,
  /dev/udp6, /dev/tcp, and /dev/tcp6).  The proposed output of pfiles on
  TLI endpoints matches the current output used for sockets -- e.g.:

        # pfiles `pgrep rpc`
        100395: /usr/sbin/rpcbind
          [ ... ]
          19: S_IFCHR mode:0000 dev:333,0 ino:44170 uid:0 gid:0 rdev:105,27
              O_RDWR
              /devices/pseudo/tl at 0:ticots
          20: S_IFCHR mode:0000 dev:333,0 ino:65028 uid:0 gid:0 rdev:42,35
              O_RDWR|O_NONBLOCK FD_CLOEXEC
  -->           sockname: AF_INET 10.8.57.32  port: 111
  -->           peername: AF_INET 10.8.57.11  port: 61404
              /devices/pseudo/tcp at 0:tcp

-- 
meem

STREAMS _I_CMD and pfiles TLI support [PSARC/2008/265 FastTrack timeout 04/23/2008]

Reply via email to