On Wed, 2008-07-02 at 05:20 -0700, Roland McGrath wrote: > Here is a vague start at the directions I have in mind for a user-level > interface that I've been calling "ntrace". ... > > I'll start with a couple of terms that I'll use later throughout the > discussion. > > A tracing session is one independent use of the interface, e.g. one > debugger application might have one session to handle many debugees. > Different sessions don't interact or know about each other directly. ... > > A tracing "channel" is the term I'll use to encompass the whole subject > of transport. ... > A channel can be a > source of commands. A channel can send data back to the user. > > For sending data, a channel might have various buffering options and > characteristics available. Sending to the channel per se has to be > nonblocking in the kernel. ... > > I think of the interface as asynchronous at base.
This certainly seems like a departure from the approach employed by systemtap, gdb, ptrace, utrace, *probes, etc., where the traced thread is essentially suspended from its normal operation while the instrumentation code handles the event. If that event handler wants to adjust the set of events I'm tracking (e.g., turning on syscall tracing when a particular user-mode function gets called), then finding out about the event (i.e., the function call) several milliseconds down the road isn't very helpful. What you're describing sounds like event logging -- which, while useful in a lot of ways, isn't what I thought we're trying to accomplish. > There may at some > point be some synchronous calls to optimize the round trips. But we > know that by its nature an interface for handling many threads at once > has to be asynchronous, because an event notification can always be > occurring while you are doing an operation on another thread. So what > keeping it simple means for the first version is that we only worry > about the asynchronous model. It seems to me that we ought to consider the possibility of multithreaded tracer apps -- e.g., where there's a tracer thread for each traced thread. That way there's the real possibility of catching all the events from a multithreaded app, in a timely manner, without all the traced threads grinding to a halt every time a thread hits an event of interest. I think we all agree that a tracer thread should be able to block while polling for an event. A key question is whether a traced thread can block waiting for the tracer app to handle the event. Ptrace certainly supports this. But as I recall, you're rather adamant that a utrace callback (other than report_quiesce) should NOT block waiting for something to happen in user space -- e.g., because a SIGKILL can't get delivered during that time. Would your new utrace API support something like this? Seems like you're on the right track otherwise. > > > Thanks, > Roland > Jim