Hi,
Last year or so, we had a discussion about a signal notification
problem with perfmon and self-monitoring multi-threaded sampling
programs.

Perfmon notification is using signal. This is the only mechanism possible
for asynchronous notifications. For user-level sampling (no kernel buffer),
you get a notification for each counter overflow. For kernel-buffer
sampling,
you get a sample whenever the buffer fills up (default format).

Typically with self sampling you want the signal to be delivered to the
thread that caused the overflow. It is not only convenient but it may
be required because a program may want to modify the thread's state
when it gets a sample.

POSIX does not mandate that asynchronous signals be delivered
to the thread from which they originate. The signal can be delivered to
any thread within the process. Synchronous signals are delivered to the
thread that caused the event, e.g., SIGFPE, SIGTRAP.

Perfmon uses the standard POSIX mechanism to request asynchronous
notifications on a file descriptor:

       flags = fcntl(fd, F_GETFL, 0);
       fcntl(fd, F_SETFL, flags | O_ASYNC);
       fcntl(fd, F_SETOWN, getpid());

By default, the SIGIO signal is used. This can be overridden using the
(non-standard) F_SETSIG command to fcntl(). SIGIO is an asynchronous
signal.


The Linux kernel maintains two signal pending queues:
    - one queue private to each thread
    - one queue shared by all threads of a process

What determines which queue to use is where you come from. If you
get a floating point exception, the signal is pended to the private queue.

If you come in for a file descriptor asynchronous notification, the signal
is pended to the shared queue. It should be noted that changing the signal
via F_SETSIG, does not alter this behavior. Any thread can pull from the
shared queue by definition.

So how come that with perfmon, the signal seem to be delivered to the
right thread?

Once the kernel pends the signal, it needs to select a thread to wake-up
or signal. That thread will have a TIF flag set and it will go pull the
signal
form the queue. Signals are first pulled from the private queue, then the
shared queue, i.e., private queue has higher priority (which is what you
want).

If possible, the kernel first tries to use the thread in which the event
occurred. If not possible, it iterates other the other threads. A thread
is selected if:
   1 - it does not have the signal blocked
   2 - it is not exiting
   3 - does not have ANY signal pending

Based on the criteria above, the reason why it works most of the time
for perfmon is because when you get the overflow notification, you do
not have another perfmon-related signal pending.

But you run into the problem if the monitored process is using signals.
For instance, if the program is using SIGALRM, then SIGIO may be
delivered to the wrong thread. If your program does not use any signal
then, you may be okay (assuming libpthread does not use signals internally).

I do not have a good fix for this now but should have one by next week.

Hopefully this clarifies all questions about this problem.
------------------------------------------------------------------------------
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to