Here is a brain dump about utrace extension events.  This is an idea I've
had more or less all along.  It underpins some of the higher level ideas
I'll post in other brain dumps.  So I'll just spew details on the concept,
so that it'll make sense when I talk about other things as if this existed
to build on.  I may be rather light at the moment on the justification for
why you want these, but I think that mostly will be best explained by
filling out the big picture so these building blocks make sense.

The utrace events that you can track via utrace_set_events are what I'll
call the "core events", in contrast to a proposed second kind of events,
"extension events".  Core events have a specialized callback in utrace_ops
and a hard-wired UTRACE_EVENT(*) bit assigned to them, the fixed
compile-time set.

With extension events, there can be new types of event.  An extension event
type can be registered dynamically (or at boot time), by a kernel module
(or subsystem or arch code).  Event types can also be unregistered
dynamically, e.g. when the defining module unloads, or just generally by
programatic choice of the registering module.  When an extension event type
has been registered, a utrace engine can listen for events of that type on
the attached task.

The registered extension event type will have a struct pointer that
identifies it.  This can be exported directly by the registering module in
a variable to be used other modules.  But registered types are also
queryable.  Probably they'll be kobjects, which can form a hierarchical
name space (that can be seen under /sys).  There may also be a notifier of
some kind for new event type registrations.

A registered event type has two uses: to post an event, and to listen for
events.  The code that registers an event type is associated with some code
that wants to post events that can be listened for.  An event type is
registered as either "filtered" or "requested".  For a filtered event type,
the event source happens whether or not anyone is listening; the code is
going to call utrace to post the event regardless.  In a requested event
type, there never would have been an event if there hadn't been a listener;
if the listener went away, the event source would go away too.

In the implementation, registering a filtered event type will assign it a
bit in utrace_flags/engine->flags.  (Or perhaps this will be done on the
first time it gets a listener.)  The high bits above the core event bitmask
will act as a Bloom filter for the set of listeners active on the task.
The event-posting call will be a fast-path check of utrace_flags & type_mask.
(Requested event types don't need to consume a bit.)

To listen for an extension event type, a utrace engine calls utrace_listen.
The caller preallocates a data structure (caller provided pointer?
allocated by the utrace call?) to represent that engine's interest in that
event type.  utrace_listen takes task, engine, event type, (possibly
caller-provided queue struct pointer).  There is a delisten call too,
details murky.

An event can be posted only on current, not claimed for another task.
Events can be posted from interrupt level, like signals.  When posting, the
caller can choose to cause a signal-like interrupt of syscalls and blocks,
to get back towards user mode sooner with some possible user-visible
perturbation.  Normally events don't interrupt.  When a listened-for event
is posted, it queues for the task.  If asking for interrupt, TIF_SIGPENDING
is set, otherwise TIF_NOTIFY_RESUME.

When an event is posted, it might have some form of event data, details
TBD.  If there is data and an event is queued, the preallocated queue
element would store the data.

When events have been posted, we get to the "almost back to user" hook
before seeing any normal signals or going to user mode.  This is the spot
where report_quiesce callbacks with event=0 run, or report_signal callbacks
run.  Here the callbacks associated with the event happen (before looking
for signals).  Callback is like report_*, but generic for extension events:
gets task, engine, listener struct, event type, pt_regs; returns
utrace_resume_action.  Details TBD.  Could be one hook in utrace_ops, or
could be a function pointer directly in the preallocated listener struct,
i.e. function chosen by each utrace_listen call.  The right callback
signature depends on what the story is for event data.

Optional fanciness: listener could provide another callback to run at
post-time.  The callback must be safe to run with interrupts off or in
interrupt handler, etc.  This can examine current state and event data, and
decide whether or not to queue this event.  (Also, it can decide to provide
a new queue element to replace the previous preallocated one.  That way a
second separate event of the same type could be queued before the task gets
back close to returning to user mode.)


Ok, so what's it for?  Well, I've gotten tired while not getting to that.
So I'll follow up with a few pointers to that thinking a little later.


Thanks,
Roland

Reply via email to