I have two threads, and one thread uses pthread_kill() to asynchronously send 
signals to the other thread.  The signal is set up to be delivered with 
SA_SIGINFO | SA_ONSTACK flags, and the other thread has setup an alternate 
signal stack.  Under some circumstances, the signal is delivered on the wrong 
stack.

My signal handler begins with

void runner::do_signal(int signo, siginfo_t *psi, void *pv)
{
#ifndef NDEBUG
        char  *_probe  = (char*)&pv;
        char  *_low    = (char*)sigstack_;
        char  *_high   = (char*)(sigstack_ + STACK_SIZE);
#endif
        assert((_probe > _low) && (_probe < _high));
..
}

On certain occasions, the assertion fails (meaning that the arguments are not 
within the bounds of the alternate signal stack), and the sigaltstack() also 
reports that the handler is NOT executing on the alternate stack.  This is the 
backtrace at that point:

current thread: [EMAIL PROTECTED]
  [1] __lwp_kill(0x2, 0x6), at 0xcbd78e65 
  [2] _thr_kill(0x2, 0x6), at 0xcbd75a1e 
  [3] raise(0x6), at 0xcbd32102 
  [4] abort(0xcbdb4d80, 0xcbdb0000, 0x65737341, 0x6f697472, 0x6166206e, 
0x64656c69), at 0xcbd10dad 
  [5] __assert(0x8079fe4, 0x807a008, 0xde), at 0xcbd10fce 
=>[6] sched::runner::do_signal(signo = 41, psi = 0x80aa600, pv = 0x80aa400), 
line 222 in "src/shdif.cc"
  [7] __sighndlr(0x29, 0x80aa600, 0x80aa400, 0x80721f0), at 0xcbd7795f 
  ---- called from signal handler with signal 41 (SIGRTMIN) ------
  [8] take_deferred_signal(0x29), at 0xcbd6d01e 
  [9] do_exit_critical(0xcbfb2400, 0xcbdb0000, 0x804784c, 0x8047798, 0x80aa704, 
0xcbd6d7a0), at 0xcbd75ab4 
  [10] block_all_signals(0xcbfb2400), at 0xcbd6d626 
  [11] _thr_sigsetmask(0x1, 0x809035c, 0x0), at 0xcbd6d7a0 
  [12] _sigprocmask(0x1, 0x809035c, 0x0), at 0xcbd6d921 
  [13] sched::runner::enter_critical(this = 0x8093f30), line 398 in 
"src/shdif.cc"
  [14] sched::enter_critical(), line 256 in "src/shd.h"
  [15] comms::send<int>(chn = CLASS, msg = CLASS), line 252 in "src/chn.h"
  [16] sender::behavior(this = 0x80a8620), line 67 in "test/bvt_sim4.cc"
  [17] sched::process::start(ppr = 0x80a8620), line 67 in "src/shdif.cc"

It seems that take_deferred_signal function does not arrange a switch to the 
alternate stack. The error is consistently reproducible in my program when it 
runs on a 2-CPU machine (thus, truly in parallell), but I have not yet 
succeeded to write a small test-case.

Opinions?
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to